knitr::opts_chunk$set(echo=FALSE, message=FALSE, warning=FALSE)
In this project, it will be predicted some of products that are sold at an e-commerce platform called ‘Trendyol’. The sold count will be examined for each product and data will be decomposed. Then, some forecasting strategies will be developed and the best among them according to their weighted mean absolute errors will be picked. The data before 29 May 2021 will be train dataset for our models to learn and data from 29 May to 11 June 2021 will be test dataset. There are 9 products that it will be examined:
Since campaign dates are important for the sales and most peaks in sales happen during these times,as external data, campaign dates of the Trendyol is investigated and included as input attribute ‘is_campaign’. The data is taken from Trendyol’s website.
Before making forecasting models, it should be looked at the plot of data and examined the seasonalities and trend. Below, you can see the plot of sales quantity of Product 1. There is a slightly increasing trend, especially in the middle of the plot. There can’t be seen any significant seasonality. To look further, there is a plot of 3 months of 2021 - March, April and May -. Again, the seasonality isn’t very significant but it is seen that the data is higher in the beginning of the month and decreases to the end of the month. It can be said that there is monthly seasonality.
First type of model that is going to used is linear regression model. First of all, it would be wise to select attributes that will help to model from correlation matrix. Below, you can see the correlations between the attributes. According to this matrix, category_sold, category_favored, and basket_count can be added to the model.
In the first model, the attributes are added to the model. The adjusted R-squared value indicates whether model is good or not. The value for the first model is pretty high which is a good sign. But there are outliers which is probably due to campaigns and holidays. The outliers can be eliminated for a better model. Lastly, ‘lag1’ attribute can be added because it is very high in the ACF. In the final linear regression model, adjusted R-squared value is high enough and plots are good enough to make predictions.
##
## Call:
## lm(formula = sold_count ~ category_sold + category_favored +
## basket_count, data = sold)
##
## Residuals:
## Min 1Q Median 3Q Max
## -86.278 -11.238 -0.387 8.763 168.980
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.7442865 2.8040394 1.692 0.0915 .
## category_sold 0.1187613 0.0062677 18.948 < 2e-16 ***
## category_favored -0.0015302 0.0002083 -7.347 1.34e-12 ***
## basket_count 0.1407651 0.0090971 15.474 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.46 on 365 degrees of freedom
## Multiple R-squared: 0.8403, Adjusted R-squared: 0.839
## F-statistic: 640.4 on 3 and 365 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 140.82, df = 10, p-value < 2.2e-16
## sold_count
## Min. : 14.00
## 1st Qu.: 33.00
## Median : 56.00
## Mean : 74.17
## 3rd Qu.: 89.00
## Max. :447.00
##
## Call:
## lm(formula = sold_count ~ big_outlier + category_sold + category_favored +
## basket_count, data = sold)
##
## Residuals:
## Min 1Q Median 3Q Max
## -80.651 -8.335 -1.034 8.277 121.209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.5878617 2.3643596 4.901 1.44e-06 ***
## big_outlier 76.5329182 5.7826657 13.235 < 2e-16 ***
## category_sold 0.0867377 0.0056964 15.227 < 2e-16 ***
## category_favored -0.0008900 0.0001781 -4.998 9.01e-07 ***
## basket_count 0.1075103 0.0078954 13.617 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.95 on 364 degrees of freedom
## Multiple R-squared: 0.8922, Adjusted R-squared: 0.891
## F-statistic: 753.2 on 4 and 364 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 112.47, df = 10, p-value < 2.2e-16
##
## Call:
## lm(formula = sold_count ~ lag1 + big_outlier + category_sold +
## category_favored + basket_count, data = sold)
##
## Residuals:
## Min 1Q Median 3Q Max
## -78.630 -7.746 -0.706 7.253 123.997
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.9269325 2.0130544 4.931 1.25e-06 ***
## lag1 0.5443102 0.0457488 11.898 < 2e-16 ***
## big_outlier 63.1752763 5.0382831 12.539 < 2e-16 ***
## category_sold 0.0940932 0.0048777 19.290 < 2e-16 ***
## category_favored -0.0009748 0.0001514 -6.438 3.84e-10 ***
## basket_count 0.1106151 0.0067112 16.482 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.79 on 363 degrees of freedom
## Multiple R-squared: 0.9225, Adjusted R-squared: 0.9214
## F-statistic: 863.6 on 5 and 363 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 18.357, df = 10, p-value = 0.04924
Second type of model that is going to build is ARIMA model. For this model, in the beginning, the data should be decomposed. Firstly, a frequency value should be chosen. Since there is no significant seasonality, the highest value in the ACF will be chosen which is 63. Additive type of decomposition will be used for this task. Below, the random series can be seen.
After the decomposition, (p,d,q) values should be chosen for the model. For this task, ACF and PACF will be examined. Looking at the ACF, for ‘q’ value 1 or 7 can be chosen and looking at the PACF, for ‘p’ value 1 can be chosen. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. Smaller AIC and BIC values means the model is better. So, looking at AIC and BIC values, (2,0,2) model that auto.arima is suggested is best among them. After the model is selected, the regressors that most correlates with the sold count are added to model to make it better. In the final model, the AIC and BIC values are lower. We can proceed with this model.
##
## Call:
## arima(x = detrend, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## 0.6650 0.0123 -1.5566
## s.e. 0.0574 0.0702 6.0436
##
## sigma^2 estimated as 1244: log likelihood = -1529.77, aic = 3067.54
## [1] 3067.536
## [1] 3082.443
##
## Call:
## arima(x = detrend, order = c(1, 0, 7))
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4 ma5 ma6 ma7
## 0.8658 -0.2496 -0.0680 -0.1138 -0.2193 -0.1632 -0.0457 -0.1405
## s.e. 0.0427 0.0696 0.0622 0.0643 0.0589 0.0551 0.0697 0.0702
## intercept
## -0.4768
## s.e. 0.5468
##
## sigma^2 estimated as 1129: log likelihood = -1516.43, aic = 3052.87
## [1] 3052.868
## [1] 3090.136
## Series: detrend
## ARIMA(2,0,2) with zero mean
##
## Coefficients:
## ar1 ar2 ma1 ma2
## 1.5221 -0.6871 -0.8673 0.1966
## s.e. 0.1703 0.0984 0.1811 0.0930
##
## sigma^2 estimated as 1201: log likelihood=-1522.43
## AIC=3054.86 AICc=3055.06 BIC=3073.5
## [1] 3054.864
## [1] 3073.498
##
## Call:
## arima(x = detrend, order = c(2, 0, 2), xreg = xreg)
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept xreg1 xreg2
## 0.8477 -0.1219 -0.1993 0.1917 -52.5534 0.1673 -2e-04
## s.e. 0.2838 0.2328 0.2780 0.0934 7.9501 0.0180 3e-04
##
## sigma^2 estimated as 780.6: log likelihood = -1458.35, aic = 2932.71
## [1] 2932.707
## [1] 2962.521
We selected two models for prediction. Here, it can be seen their accuracy values. According to box plot, the variance of weighted mean absolute errors for linear model is higher especially in the end. We should choose Arima model because WMAPE value of the model is lower which is a sign for better model.
## variable n mean sd CV FBias MAPE RMSE
## 1: lm_prediction 14 83.35714 17.09074 0.2050303 -0.72352232 0.8010225 109.8228
## 2: selected_arima 14 83.35714 17.09074 0.2050303 -0.03885441 0.3287008 35.2479
## MAD MADP WMAPE
## 1: 63.38325 0.7603817 0.7603817
## 2: 26.33523 0.3159325 0.3159325
For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are pretty accurate.
Before making forecasting models for product 2, it should be looked at the plot of data and examined the seasonalities and trend. Below, you can see the plot of sales quantity of Product 2. There isn’t a significant trend as it can be seen. Also, there can’t be seen any significant seasonality. To look further, there is a plot of 3 months of 2021 - March, April and May -. Again, the seasonality isn’t significant, though it can be said there is a spike in the plot at the beginning of the month. In May, there is a big rising probably due to Covid-19 conditions. In conclusion, it can be said that there is monthly seasonality but it isn’t very clear.
First type of model that is going to used is linear regression model. First of all, it would be wise to select attributes that will help to model from correlation matrix. Below, you can see the correlations between the attributes. According to this matrix, category_sold, category_visits, and basket_count can be added to the model.
In the first model, the attributes are added to the model. The adjusted R-squared value indicates whether model is good or not. The value for the first model is pretty high which is a good sign. But there are outliers which is probably due to campaigns and holidays. The outliers can be eliminated for a better model. Lastly, ‘lag1’ attribute can be added because it is very high in the ACF. In the final linear regression model, adjusted R-squared value is high enough and plots are good enough to make predictions.
##
## Call:
## lm(formula = sold_count ~ category_sold + category_visits + basket_count,
## data = sold)
##
## Residuals:
## Min 1Q Median 3Q Max
## -422.40 -60.15 1.95 63.20 1208.91
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -60.17090 11.42700 -5.266 2.39e-07 ***
## category_sold 0.14185 0.02200 6.449 3.58e-10 ***
## category_visits 0.00693 0.01256 0.552 0.581
## basket_count 0.18780 0.01162 16.161 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 128.9 on 365 degrees of freedom
## Multiple R-squared: 0.9068, Adjusted R-squared: 0.906
## F-statistic: 1183 on 3 and 365 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 125.12, df = 10, p-value < 2.2e-16
## sold_count
## Min. : 30.0
## 1st Qu.: 165.0
## Median : 238.0
## Mean : 381.4
## 3rd Qu.: 431.0
## Max. :4191.0
##
## Call:
## lm(formula = sold_count ~ big_outlier + category_sold + category_visits +
## basket_count, data = sold)
##
## Residuals:
## Min 1Q Median 3Q Max
## -356.35 -52.28 10.07 53.54 1315.86
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.148e+01 1.241e+01 -1.730 0.0845 .
## big_outlier 2.303e+02 3.592e+01 6.410 4.51e-10 ***
## category_sold 1.425e-01 2.088e-02 6.824 3.71e-11 ***
## category_visits -4.873e-04 1.198e-02 -0.041 0.9676
## basket_count 1.477e-01 1.268e-02 11.655 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 122.4 on 364 degrees of freedom
## Multiple R-squared: 0.9162, Adjusted R-squared: 0.9153
## F-statistic: 995.2 on 4 and 364 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 95.607, df = 10, p-value = 4.11e-16
##
## Call:
## lm(formula = sold_count ~ lag1 + big_outlier + category_sold +
## category_visits + basket_count, data = sold)
##
## Residuals:
## Min 1Q Median 3Q Max
## -381.58 -37.12 4.89 39.84 1334.45
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -40.28635 11.39508 -3.535 0.00046 ***
## lag1 0.44599 0.04880 9.140 < 2e-16 ***
## big_outlier 178.62606 32.91952 5.426 1.05e-07 ***
## category_sold 0.13014 0.01890 6.886 2.54e-11 ***
## category_visits 0.01271 0.01091 1.165 0.24494
## basket_count 0.15168 0.01145 13.244 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 110.5 on 363 degrees of freedom
## Multiple R-squared: 0.9319, Adjusted R-squared: 0.931
## F-statistic: 993.4 on 5 and 363 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 74.502, df = 10, p-value = 5.947e-12
Second type of model that is going to build is ARIMA model. For this model, in the beginning, the data should be decomposed. Firstly, a frequency value should be chosen. Since there is no significant seasonality, the highest value in the ACF will be chosen which is 34. Additive type of decomposition will be used for this task. Below, the random series can be seen.
After the decomposition, (p,d,q) values should be chosen for the model. For this task, ACF and PACF will be examined. Looking at the ACF, for ‘q’ value 1 or 11 can be chosen and looking at the PACF, for ‘p’ value 1 can be chosen. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. Looking at AIC and BIC values, (1,0,11) model is best among them. After the model is selected, the regressors that most correlates with the sold count are added to model to make it better. In the final model, the AIC and BIC values are lower. We can proceed with this model.
##
## Call:
## arima(x = detrend, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## 0.5985 0.1204 -2.2120
## s.e. 0.0598 0.0686 45.0812
##
## sigma^2 estimated as 88277: log likelihood = -2383.17, aic = 4774.34
## [1] 4774.343
## [1] 4789.6
##
## Call:
## arima(x = detrend, order = c(1, 0, 11))
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4 ma5 ma6 ma7
## 0.5115 0.0898 0.0048 -0.1392 -0.1806 -0.2103 -0.1589 -0.1076
## s.e. 0.2066 0.2088 0.1286 0.0770 0.0556 0.0745 0.0945 0.0925
## ma8 ma9 ma10 ma11 intercept
## -0.0942 -0.0735 -0.0572 -0.0731 0.3060
## s.e. 0.0784 0.0771 0.0727 0.0640 2.0291
##
## sigma^2 estimated as 76841: log likelihood = -2361.76, aic = 4751.51
## [1] 4751.515
## [1] 4804.913
## Series: detrend
## ARIMA(3,0,0) with zero mean
##
## Coefficients:
## ar1 ar2 ar3
## 0.7228 -0.0081 -0.1412
## s.e. 0.0540 0.0669 0.0539
##
## sigma^2 estimated as 86941: log likelihood=-2379.15
## AIC=4766.29 AICc=4766.41 BIC=4781.55
## [1] 4766.292
## [1] 4781.549
##
## Call:
## arima(x = detrend, order = c(1, 0, 11), xreg = xreg)
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
## 0.5558 0.1483 0.178 0.1079 0.0327 8e-04 0.0653 0.0634 0.0101
## s.e. NaN NaN NaN NaN NaN NaN NaN NaN NaN
## ma9 ma10 ma11 intercept xreg1 xreg2 xreg3
## 0.0076 0.0436 0.0388 -450.0970 0.1404 0.0732 0.0487
## s.e. NaN 0.0533 0.0598 33.0371 0.0164 0.0184 0.0316
##
## sigma^2 estimated as 19786: log likelihood = -2132.8, aic = 4299.6
## [1] 4299.597
## [1] 4364.438
We selected two models for prediction. Here, it can be seen their accuracy values. According to box plot, the weighted mean absolute errors for Arima model is higher. We should choose Linear model because WMAPE value of the model is lower which is a sign for better model.
## variable n mean sd CV FBias MAPE RMSE
## 1: lm_prediction 14 542.4286 335.978 0.6193958 -0.1358889 0.2050354 263.4115
## 2: selected_arima 14 542.4286 335.978 0.6193958 0.8441860 0.8331456 649.5670
## MAD MADP WMAPE
## 1: 115.9278 0.2137200 0.2137200
## 2: 512.1721 0.9442203 0.9442203
For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are pretty accurate.
At below,looking at the plots of the product; in line graph it can be observed that the sales have variance, in some dates the plot has peaks and also there might be a cyclical behaviour which is an indicator for seasonality. For further investigation, ‘3 Months Sales of 2021’ plot can be examined, there is not clear repeating pattern that can be easily observed.
Looking at the boxplots; in the weekly boxplot the sales is weekdays seem to be similar, daily and weekly seasonaity can be investigated. In monthly boxplot, there is change with respect to months, there is no clear repeating monthly behaviour. In histograms, one can observe that the sales’ distribution is close to normal distribution.
Firstly, different ARIMA models can be built in order to test different models on the test set. For this purpose, before building an ARIMA model, the data should be decomposed,a frequency value should be chosen. 30 and 7 day frequency can be selected and the data can be decomposed accordingly. Along with 30 and 7 day frequency, ACF plot of the data can be examined and in the lag that we see high autocorrelation it can be chosen as another trial frequency to decompose. Since variance don’t seem to be increasing, additive type of decomposition can be used for decomposition. Below, the random series can be seen.
Decomposition with 7 Day Freq
The above decomposition series belong to time series with 7 and 30 days frequency, respectively.
Looking at the ACF plot of the series, highest ACF value belongs to lag 32, so time series decomposition with 32 day frequency would be sufficient.
In time series decomposition, it is assumed that the random part is randomly distributed with mean zero and standard deviation 1; in order to decide on the best frequency, the random part of the decomposed series should be observed. In this case, the random part of the decomposed time series with 7 day frequency seem to be closer to randomly distributed series with mean zero and std dev 1, so it is chosen as the final decomposition.
After the decomposition, (p,d,q) values should be chosen for the model. For this task, ACF and PACF will be examined.For q, peaks at ACF function can be chosen and for p values, peaks at PACF function can be chosen. Looking at the ACF, for ‘q’ value 3 or 4 may be selected and looking at the PACF, for ‘p’ value 3 or 9 may be selected. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. Smaller AIC and BIC values means the model is better. So, looking at AIC and BIC values, (3,0,4) model that auto.arima has suggested is best among them.
##
## Call:
## arima(x = detrend, order = c(3, 0, 3))
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 intercept
## 0.3628 0.1249 -0.3545 -0.5395 -0.4382 -0.0223 -0.0160
## s.e. 0.1570 0.2211 0.1415 0.1652 0.2446 0.2010 0.0715
##
## sigma^2 estimated as 9123: log likelihood = -2376.31, aic = 4768.62
## [1] 4768.62
## [1] 4800.491
##
## Call:
## arima(x = detrend, order = c(3, 0, 4))
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 ma4 intercept
## 1.0944 -0.3181 0.0127 -1.3286 0.1373 -0.2304 0.4219 -0.0193
## s.e. 0.1386 0.1984 0.0992 0.1290 0.2123 0.1199 0.0750 0.0142
##
## sigma^2 estimated as 8476: log likelihood = -2363.65, aic = 4745.3
## [1] 4745.295
## [1] 4781.151
##
## Call:
## arima(x = detrend, order = c(9, 0, 4))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8 ar9
## 0.5188 0.3226 0.0766 -0.4315 0.2393 0.0072 -0.0727 0.1220 -0.0980
## s.e. 0.6411 0.5899 0.3413 0.5940 0.2110 0.2125 0.2393 0.0627 0.1065
## ma1 ma2 ma3 ma4 intercept
## -0.7461 -0.6490 -0.3833 0.7785 -0.0194
## s.e. 0.6119 0.7234 0.4355 0.5112 0.0133
##
## sigma^2 estimated as 8335: log likelihood = -2360.63, aic = 4751.26
## [1] 4751.256
## [1] 4811.016
##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,0,2) with non-zero mean : 4793.392
## ARIMA(0,0,0) with non-zero mean : 4925.4
## ARIMA(1,0,0) with non-zero mean : 4919.53
## ARIMA(0,0,1) with non-zero mean : 4916.887
## ARIMA(0,0,0) with zero mean : 4923.38
## ARIMA(1,0,2) with non-zero mean : Inf
## ARIMA(2,0,1) with non-zero mean : 4793.637
## ARIMA(3,0,2) with non-zero mean : Inf
## ARIMA(2,0,3) with non-zero mean : Inf
## ARIMA(1,0,1) with non-zero mean : 4919.128
## ARIMA(1,0,3) with non-zero mean : Inf
## ARIMA(3,0,1) with non-zero mean : Inf
## ARIMA(3,0,3) with non-zero mean : Inf
## ARIMA(2,0,2) with zero mean : 4791.758
## ARIMA(1,0,2) with zero mean : 4818.262
## ARIMA(2,0,1) with zero mean : 4792.121
## ARIMA(3,0,2) with zero mean : Inf
## ARIMA(2,0,3) with zero mean : Inf
## ARIMA(1,0,1) with zero mean : 4917.087
## ARIMA(1,0,3) with zero mean : Inf
## ARIMA(3,0,1) with zero mean : Inf
## ARIMA(3,0,3) with zero mean : Inf
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(2,0,2) with zero mean : Inf
## ARIMA(2,0,1) with zero mean : Inf
## ARIMA(2,0,2) with non-zero mean : Inf
## ARIMA(2,0,1) with non-zero mean : Inf
## ARIMA(1,0,2) with zero mean : Inf
## ARIMA(0,0,1) with non-zero mean : 4916.897
##
## Best model: ARIMA(0,0,1) with non-zero mean
## Series: detrend
## ARIMA(0,0,1) with non-zero mean
##
## Coefficients:
## ma1 mean
## 0.1699 -0.1507
## s.e. 0.0485 6.8933
##
## sigma^2 estimated as 13863: log likelihood=-2455.42
## AIC=4916.84 AICc=4916.9 BIC=4928.79
## [1] 4916.836
## [1] 4928.787
The second type of model that is going to used is linear regression model. Below, you can see the correlations between the attributes. According to this matrix, basket_count, price_count, visit_count and favored_count can be added to the model. since ,above, in the box plots, it has been observed that there is monthly change in the data, so month information can also be added to the candidate models.
Different linear regression models and ARIMA models’ performance on the test dates will be calculated and according to their performance, best model can be selected.
## variable n mean sd CV FBias MAPE
## 1: lm_prediction2 14 451.5714 90.71063 0.2008777 -0.02562712 0.09325325
## 2: lm_prediction3 14 451.5714 90.71063 0.2008777 -0.07674217 0.11904792
## 3: lm_prediction4 14 451.5714 90.71063 0.2008777 -0.08393829 0.11670367
## 4: lm_prediction5 14 451.5714 90.71063 0.2008777 -0.11437066 0.12859235
## 5: lm_prediction6 14 451.5714 90.71063 0.2008777 -0.03526619 0.07687833
## 6: lm_prediction7 14 451.5714 90.71063 0.2008777 -0.10621457 0.12427903
## 7: arima_prediction 14 451.5714 90.71063 0.2008777 0.05141121 0.12779687
## 8: sarima_prediction 14 451.5714 90.71063 0.2008777 0.05256333 0.12798436
## 9: selected_arima 14 451.5714 90.71063 0.2008777 0.09418716 0.17941751
## RMSE MAD MADP WMAPE
## 1: 49.35944 40.27963 0.08919881 0.08919881
## 2: 58.72637 50.19184 0.11114927 0.11114927
## 3: 59.66218 49.63226 0.10991009 0.10991009
## 4: 65.04994 56.12976 0.12429875 0.12429875
## 5: 42.38769 32.55791 0.07209913 0.07209913
## 6: 61.13382 53.07757 0.11753969 0.11753969
## 7: 77.45611 61.04713 0.13518821 0.13518821
## 8: 77.46723 61.18399 0.13549128 0.13549128
## 9: 100.82860 81.07444 0.17953847 0.17953847
Smallest Weighted Mean Absolute Percentage Error is obtained for the linear regression model ‘sold_count~basket_count + visit_count + as.factor(mon)+ as.factor(is_campaign)’, so further on this model is selected for our prediction purposes.
For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are pretty accurate.
## One Day Ahead Prediction with the Selected Model for Product 3
With the selected model, 1 day ahead prediction can be performed using all the data on hand, since in this competition one day ahead prediction should be submitted.
## price event_date product_content_id sold_count visit_count favored_count
## 1: 119.66 2021-07-01 6676673 312 11562 777
## basket_count category_sold category_brand_sold category_visits ty_visits
## 1: 930 4839 752 256832 106491398
## category_basket category_favored w_day mon is_campaign
## 1: 21667 19158 5 7 0
## price event_date product_content_id sold_count visit_count favored_count
## 1: 119.66 2021-07-03 6676673 312 11562 777
## basket_count category_sold category_brand_sold category_visits ty_visits
## 1: 930 4839 752 256832 106491398
## category_basket category_favored w_day mon is_campaign lm_prediction
## 1: 21667 19158 5 7 0 366.5632
At below,looking at the plots of the product; in line graph it can be observed that the sales have variance, in some dates the plot has high outliers and also there might be a cyclical behaviour which is an indicator for seasonality. For further investigation, ‘3 Months Sales of 2021’ plot can be examined, there is not clear repeating pattern that can be easily observed.
Looking at the boxplots; in the weekly boxplot the sales is weekdays seem to be similar, daily and weekly seasonaity can be investigated. In monthly boxplot, there is change with respect to months, there is no clear repeating monthly behaviour. In histograms, one can observe that the sales’ distribution is close to normal distribution.
Firstly, different ARIMA models can be built in order to test different models on the test set. 30 and 7 day frequency can be selected and the data can be decomposed accordingly. Since variance don’t seem to be increasing, additive type of decomposition can be used for decomposition. Below, the random series can be seen.
The above decomposition series belong to time series with 7 and 30 days frequency, respectively. Looking at the ACF plot of the series, highest ACF value belongs to lag 16, so time series decomposition with 16 day frequency would be sufficient.
In this case, the random part of the decomposed time series with 16 day frequency seem to be closer to randomly distributed series with mean zero and std dev 1, so it is chosen as the final decomposition.
Looking at the ACF, for ‘q’ value 5 or 7 may be selected and looking at the PACF, for ‘p’ value 1 or 3 may be selected. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. So, looking at AIC and BIC values, ARIMA(3,0,5) model that is selected with observing the ACF and PACF plots, ARIMA(3,0,5) model’s AIC value is smaller than the ARIMA(1,0,2) model’s AIC value which is suggested by auto arima. For performance comparison with linear models, ARIMA(3,0,5) will be used.
##
## Call:
## arima(x = detrend, order = c(3, 0, 7))
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 ma4 ma5 ma6
## 0.829 0.6108 -0.5578 -0.57 -0.7939 0.0527 -0.0214 0.1489 0.0724
## s.e. NaN NaN NaN NaN NaN NaN 0.0743 0.0730 0.0317
## ma7 intercept
## 0.1115 -0.0971
## s.e. 0.0542 0.0720
##
## sigma^2 estimated as 13975: log likelihood = -2400.34, aic = 4824.68
## [1] 4824.677
## [1] 4872.178
##
## Call:
## arima(x = detrend, order = c(3, 0, 5))
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 ma4 ma5
## 0.7748 0.8682 -0.759 -0.5212 -1.0612 0.1521 0.1158 0.3146
## s.e. NaN NaN NaN NaN NaN 0.0821 0.0583 0.0507
## intercept
## -0.0893
## s.e. 0.0823
##
## sigma^2 estimated as 14210: log likelihood = -2403.47, aic = 4826.94
## [1] 4826.937
## [1] 4866.521
##
## Call:
## arima(x = detrend, order = c(1, 0, 5))
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4 ma5 intercept
## 0.5901 -0.2768 -0.0907 -0.2992 -0.1936 -0.1397 -0.0462
## s.e. 0.0721 0.0779 0.0548 0.0589 0.0598 0.0596 0.3752
##
## sigma^2 estimated as 14996: log likelihood = -2411.97, aic = 4839.94
## [1] 4839.942
## [1] 4871.609
##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,0,2) with non-zero mean : 4853.412
## ARIMA(0,0,0) with non-zero mean : 5003.415
## ARIMA(1,0,0) with non-zero mean : 4904.066
## ARIMA(0,0,1) with non-zero mean : 4925.875
## ARIMA(0,0,0) with zero mean : 5001.402
## ARIMA(1,0,2) with non-zero mean : 4893.675
## ARIMA(2,0,1) with non-zero mean : 4907.233
## ARIMA(3,0,2) with non-zero mean : Inf
## ARIMA(2,0,3) with non-zero mean : 4845.681
## ARIMA(1,0,3) with non-zero mean : 4894.608
## ARIMA(3,0,3) with non-zero mean : Inf
## ARIMA(2,0,4) with non-zero mean : 4847.091
## ARIMA(1,0,4) with non-zero mean : Inf
## ARIMA(3,0,4) with non-zero mean : Inf
## ARIMA(2,0,3) with zero mean : 4843.93
## ARIMA(1,0,3) with zero mean : 4892.558
## ARIMA(2,0,2) with zero mean : 4851.59
## ARIMA(3,0,3) with zero mean : Inf
## ARIMA(2,0,4) with zero mean : 4845.366
## ARIMA(1,0,2) with zero mean : 4891.627
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(3,0,2) with zero mean : Inf
## ARIMA(3,0,4) with zero mean : Inf
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(2,0,3) with zero mean : Inf
## ARIMA(2,0,4) with zero mean : Inf
## ARIMA(2,0,3) with non-zero mean : Inf
## ARIMA(2,0,4) with non-zero mean : Inf
## ARIMA(2,0,2) with zero mean : Inf
## ARIMA(2,0,2) with non-zero mean : Inf
## ARIMA(1,0,2) with zero mean : 4891.046
##
## Best model: ARIMA(1,0,2) with zero mean
## Series: detrend
## ARIMA(1,0,2) with zero mean
##
## Coefficients:
## ar1 ma1 ma2
## 0.1257 0.3647 0.2845
## s.e. 0.1452 0.1389 0.0696
##
## sigma^2 estimated as 17790: log likelihood=-2441.47
## AIC=4890.94 AICc=4891.05 BIC=4906.78
## [1] 4890.942
## [1] 4906.775
Below, you can see the correlations between the attributes. According to this matrix, basket_count, category_favored, is_campaign and category_sold can be added to the model, with different combinations. Since ,above, in the box plots, it has been observed that there is monthly change in the data, so month information can also be added to the candidate models.
Different linear regression models and ARIMA models’ performance on the test dates will be calculated and according to their performance, best model can be selected.
## variable n mean sd CV FBias MAPE RMSE
## 1: lm_prediction1 14 21 7.200427 0.3428775 -0.20693883 0.2697431 5.966694
## 2: lm_prediction2 14 21 7.200427 0.3428775 -2.97927177 3.4236791 75.719475
## 3: lm_prediction3 14 21 7.200427 0.3428775 -3.28869474 3.8131788 83.123744
## 4: lm_prediction4 14 21 7.200427 0.3428775 -3.05884773 3.5175993 76.628455
## 5: lm_prediction5 14 21 7.200427 0.3428775 -0.35648820 0.3872554 13.486489
## 6: lm_prediction6 14 21 7.200427 0.3428775 -2.81391925 3.2181353 71.004414
## 7: arima_prediction 14 21 7.200427 0.3428775 -0.09014912 0.2865406 7.276734
## 8: sarima_prediction 14 21 7.200427 0.3428775 0.02528538 0.2798477 7.197155
## 9: selected_arima 14 21 7.200427 0.3428775 0.10692728 0.3737146 9.239168
## MAD MADP WMAPE
## 1: 5.193758 0.2473218 0.2473218
## 2: 62.564707 2.9792718 2.9792718
## 3: 69.062590 3.2886947 3.2886947
## 4: 64.235802 3.0588477 3.0588477
## 5: 8.640406 0.4114479 0.4114479
## 6: 59.092304 2.8139193 2.8139193
## 7: 5.722414 0.2724959 0.2724959
## 8: 5.455298 0.2597761 0.2597761
## 9: 7.505308 0.3573956 0.3573956
Smallest Weighted Mean Absolute Percentage Error is obtained for the linear regression model ‘sold_count~basket_count +as.factor(mon)’,but, since it has 2 input attributes, when one of them increases slightly, its effect will be much more impactful, so it has been chosen to continue with the model that has second smallest WMAPE , ARIMA(1,1,4) with decomposed series with 16 day frequency,and it is the model that auto arima suggested. So further on this model is selected for our prediction purposes.
For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are not too far.
With the selected model, 1 day ahead prediction can be performed using all the data on hand, since in this competition one day ahead prediction should be submitted.
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0067
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
##
## Call:
## arima(x = detrend1, order = c(1, 1, 4), xreg = data_7061886$is_campaign)
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4 data_7061886$is_campaign
## -0.0681 -0.3684 -0.1174 -0.1867 -0.3275 21.6371
## s.e. 0.1656 0.1552 0.1043 0.0565 0.0649 5.4080
##
## sigma^2 estimated as 453.3: log likelihood = -1730.7, aic = 3475.4
## [1] 3475.401
## [1] 3503.092
## price event_date product_content_id sold_count visit_count favored_count
## 1: 297.9 2021-07-03 7061886 15 1074 103
## basket_count category_sold category_brand_sold category_visits ty_visits
## 1: 51 886 184 64930 106491398
## category_basket category_favored w_day mon is_campaign arima1_prediction
## 1: 3324 5648 5 7 0 4.456416
At below,looking at the plots of the product; in line graph it can be observed that the sales have increasing variance, in some dates the plot has high outliers and also there might be a cyclical behaviour which is an indicator for seasonality. For further investigation, ‘3 Months Sales of 2021’ plot can be examined, there is not clear repeating pattern that can be easily observed.
Looking at the boxplots; in the weekly boxplot the sales is weekdays seem to be similar, daily and weekly seasonaity can be investigated. In monthly boxplot, there is change with respect to months, however median of the months seem to be close to each other, this may be an indicator for monthly seasonality. In histograms, one can observe that the sales’ distribution is close to normal distribution.
Firstly, different ARIMA models can be built in order to test different models on the test set. 30 and 7 day frequency can be selected and the data can be decomposed accordingly. Since variance seem to be increasing, multiplicative type of decomposition can be used for decomposition. Below, the random series can be seen.
The above decomposition series belong to time series with 7 and 30 days frequency, respectively. Looking at the ACF plot of the series, highest ACF value belongs to lag 16, so time series decomposition with 16 day frequency would be sufficient.
In this case, the random part of the decomposed time series with 16 day frequency seem to be closer to randomly distributed series with mean zero and std dev 1, so it is chosen as the final decomposition.
Looking at the ACF, for ‘q’ value 2,5 or 8 may be selected and looking at the PACF, for ‘p’ value 3 or 4 may be selected. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. So, looking at AIC and BIC values, ARIMA(3,0,5) model that is selected with observing the ACF and PACF plots, ARIMA(3,0,5) model’s AIC value is smaller than the ARIMA(1,0,3) model’s AIC value which is suggested by auto arima. For performance comparison with linear models, ARIMA(3,0,5) will be used. ARIMA(3,0,5) best.
Below, you can see the correlations between the attributes. According to this matrix, basket_count, favored_count, is_campaign and category_sold can be added to the model, with different combinations. Since ,above, in the box plots, it has been observed that there is monthly change in the data, so month information can also be added to the candidate models.
Different linear regression models and ARIMA models’ performance on the test dates will be calculated and according to their performance, best model can be selected.
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6394.698
## ARIMA(0,0,0) with non-zero mean : 6220.27
## ARIMA(0,0,1) with zero mean : 6126.411
## ARIMA(0,0,1) with non-zero mean : 6009.282
## ARIMA(0,0,2) with zero mean : 6043.806
## ARIMA(0,0,2) with non-zero mean : 5957.195
## ARIMA(0,0,3) with zero mean : 5942.598
## ARIMA(0,0,3) with non-zero mean : 5884.053
## ARIMA(0,0,4) with zero mean : 5921.67
## ARIMA(0,0,4) with non-zero mean : 5877.716
## ARIMA(0,0,5) with zero mean : 5918.286
## ARIMA(0,0,5) with non-zero mean : 5879.596
## ARIMA(1,0,0) with zero mean : 5928.848
## ARIMA(1,0,0) with non-zero mean : 5911.463
## ARIMA(1,0,1) with zero mean : 5929.506
## ARIMA(1,0,1) with non-zero mean : 5909.434
## ARIMA(1,0,2) with zero mean : 5926.647
## ARIMA(1,0,2) with non-zero mean : 5903.617
## ARIMA(1,0,3) with zero mean : 5911.226
## ARIMA(1,0,3) with non-zero mean : 5877.901
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 5879.617
## ARIMA(2,0,0) with zero mean : 5929.216
## ARIMA(2,0,0) with non-zero mean : 5907.817
## ARIMA(2,0,1) with zero mean : 5930.771
## ARIMA(2,0,1) with non-zero mean : 5902.491
## ARIMA(2,0,2) with zero mean : 5925.483
## ARIMA(2,0,2) with non-zero mean : 5891.825
## ARIMA(2,0,3) with zero mean : 5911.948
## ARIMA(2,0,3) with non-zero mean : 5879.561
## ARIMA(3,0,0) with zero mean : 5928.15
## ARIMA(3,0,0) with non-zero mean : 5900.006
## ARIMA(3,0,1) with zero mean : 5930.061
## ARIMA(3,0,1) with non-zero mean : 5899.854
## ARIMA(3,0,2) with zero mean : 5933.983
## ARIMA(3,0,2) with non-zero mean : 5887.709
## ARIMA(4,0,0) with zero mean : 5929.24
## ARIMA(4,0,0) with non-zero mean : 5894.763
## ARIMA(4,0,1) with zero mean : 5925.197
## ARIMA(4,0,1) with non-zero mean : 5891.308
## ARIMA(5,0,0) with zero mean : 5906.657
## ARIMA(5,0,0) with non-zero mean : 5884.985
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
##
## ARIMA(0,0,0) with zero mean : 6394.698
## ARIMA(0,0,0) with non-zero mean : 6220.27
## ARIMA(0,0,0)(0,0,1)[16] with zero mean : 6294.138
## ARIMA(0,0,0)(0,0,1)[16] with non-zero mean : 6182.676
## ARIMA(0,0,0)(0,0,2)[16] with zero mean : 6267.205
## ARIMA(0,0,0)(0,0,2)[16] with non-zero mean : 6181.942
## ARIMA(0,0,0)(1,0,0)[16] with zero mean : 6247.732
## ARIMA(0,0,0)(1,0,0)[16] with non-zero mean : 6180.785
## ARIMA(0,0,0)(1,0,1)[16] with zero mean : 6236.257
## ARIMA(0,0,0)(1,0,1)[16] with non-zero mean : 6182.301
## ARIMA(0,0,0)(1,0,2)[16] with zero mean : Inf
## ARIMA(0,0,0)(1,0,2)[16] with non-zero mean : 6183.096
## ARIMA(0,0,0)(2,0,0)[16] with zero mean : 6244.526
## ARIMA(0,0,0)(2,0,0)[16] with non-zero mean : 6182.286
## ARIMA(0,0,0)(2,0,1)[16] with zero mean : Inf
## ARIMA(0,0,0)(2,0,1)[16] with non-zero mean : Inf
## ARIMA(0,0,0)(2,0,2)[16] with zero mean : Inf
## ARIMA(0,0,0)(2,0,2)[16] with non-zero mean : Inf
## ARIMA(0,0,1) with zero mean : 6126.411
## ARIMA(0,0,1) with non-zero mean : 6009.282
## ARIMA(0,0,1)(0,0,1)[16] with zero mean : 6069.038
## ARIMA(0,0,1)(0,0,1)[16] with non-zero mean : 5987.659
## ARIMA(0,0,1)(0,0,2)[16] with zero mean : 6052.357
## ARIMA(0,0,1)(0,0,2)[16] with non-zero mean : 5987.348
## ARIMA(0,0,1)(1,0,0)[16] with zero mean : 6043.194
## ARIMA(0,0,1)(1,0,0)[16] with non-zero mean : 5985.413
## ARIMA(0,0,1)(1,0,1)[16] with zero mean : 6028.021
## ARIMA(0,0,1)(1,0,1)[16] with non-zero mean : 5987.431
## ARIMA(0,0,1)(1,0,2)[16] with zero mean : Inf
## ARIMA(0,0,1)(1,0,2)[16] with non-zero mean : 5989.218
## ARIMA(0,0,1)(2,0,0)[16] with zero mean : 6037.264
## ARIMA(0,0,1)(2,0,0)[16] with non-zero mean : 5987.43
## ARIMA(0,0,1)(2,0,1)[16] with zero mean : Inf
## ARIMA(0,0,1)(2,0,1)[16] with non-zero mean : 5989.3
## ARIMA(0,0,1)(2,0,2)[16] with zero mean : Inf
## ARIMA(0,0,1)(2,0,2)[16] with non-zero mean : 5991.24
## ARIMA(0,0,2) with zero mean : 6043.806
## ARIMA(0,0,2) with non-zero mean : 5957.195
## ARIMA(0,0,2)(0,0,1)[16] with zero mean : 6001.241
## ARIMA(0,0,2)(0,0,1)[16] with non-zero mean : 5939.246
## ARIMA(0,0,2)(0,0,2)[16] with zero mean : 5992.86
## ARIMA(0,0,2)(0,0,2)[16] with non-zero mean : 5940.305
## ARIMA(0,0,2)(1,0,0)[16] with zero mean : 5986.603
## ARIMA(0,0,2)(1,0,0)[16] with non-zero mean : 5938.173
## ARIMA(0,0,2)(1,0,1)[16] with zero mean : 5977.252
## ARIMA(0,0,2)(1,0,1)[16] with non-zero mean : 5940.24
## ARIMA(0,0,2)(1,0,2)[16] with zero mean : Inf
## ARIMA(0,0,2)(1,0,2)[16] with non-zero mean : 5942.317
## ARIMA(0,0,2)(2,0,0)[16] with zero mean : 5983.547
## ARIMA(0,0,2)(2,0,0)[16] with non-zero mean : 5940.24
## ARIMA(0,0,2)(2,0,1)[16] with zero mean : Inf
## ARIMA(0,0,2)(2,0,1)[16] with non-zero mean : 5942.318
## ARIMA(0,0,3) with zero mean : 5942.598
## ARIMA(0,0,3) with non-zero mean : 5884.053
## ARIMA(0,0,3)(0,0,1)[16] with zero mean : 5917.294
## ARIMA(0,0,3)(0,0,1)[16] with non-zero mean : 5873.652
## ARIMA(0,0,3)(0,0,2)[16] with zero mean : 5915.625
## ARIMA(0,0,3)(0,0,2)[16] with non-zero mean : 5875.614
## ARIMA(0,0,3)(1,0,0)[16] with zero mean : 5911.389
## ARIMA(0,0,3)(1,0,0)[16] with non-zero mean : 5873.482
## ARIMA(0,0,3)(1,0,1)[16] with zero mean : 5904.144
## ARIMA(0,0,3)(1,0,1)[16] with non-zero mean : 5875.549
## ARIMA(0,0,3)(2,0,0)[16] with zero mean : 5910.439
## ARIMA(0,0,3)(2,0,0)[16] with non-zero mean : 5875.553
## ARIMA(0,0,4) with zero mean : 5921.67
## ARIMA(0,0,4) with non-zero mean : 5877.716
## ARIMA(0,0,4)(0,0,1)[16] with zero mean : 5902.592
## ARIMA(0,0,4)(0,0,1)[16] with non-zero mean : 5868.317
## ARIMA(0,0,4)(1,0,0)[16] with zero mean : 5898.323
## ARIMA(0,0,4)(1,0,0)[16] with non-zero mean : 5867.987
## ARIMA(0,0,5) with zero mean : 5918.286
## ARIMA(0,0,5) with non-zero mean : 5879.596
## ARIMA(1,0,0) with zero mean : 5928.848
## ARIMA(1,0,0) with non-zero mean : 5911.463
## ARIMA(1,0,0)(0,0,1)[16] with zero mean : 5913.662
## ARIMA(1,0,0)(0,0,1)[16] with non-zero mean : 5898.296
## ARIMA(1,0,0)(0,0,2)[16] with zero mean : 5915.174
## ARIMA(1,0,0)(0,0,2)[16] with non-zero mean : 5900.264
## ARIMA(1,0,0)(1,0,0)[16] with zero mean : 5912.762
## ARIMA(1,0,0)(1,0,0)[16] with non-zero mean : 5898.366
## ARIMA(1,0,0)(1,0,1)[16] with zero mean : 5914.729
## ARIMA(1,0,0)(1,0,1)[16] with non-zero mean : 5900.247
## ARIMA(1,0,0)(1,0,2)[16] with zero mean : 5915.576
## ARIMA(1,0,0)(1,0,2)[16] with non-zero mean : Inf
## ARIMA(1,0,0)(2,0,0)[16] with zero mean : 5914.766
## ARIMA(1,0,0)(2,0,0)[16] with non-zero mean : 5900.282
## ARIMA(1,0,0)(2,0,1)[16] with zero mean : Inf
## ARIMA(1,0,0)(2,0,1)[16] with non-zero mean : Inf
## ARIMA(1,0,0)(2,0,2)[16] with zero mean : Inf
## ARIMA(1,0,0)(2,0,2)[16] with non-zero mean : 5904.293
## ARIMA(1,0,1) with zero mean : 5929.506
## ARIMA(1,0,1) with non-zero mean : 5909.434
## ARIMA(1,0,1)(0,0,1)[16] with zero mean : 5914.708
## ARIMA(1,0,1)(0,0,1)[16] with non-zero mean : 5897.18
## ARIMA(1,0,1)(0,0,2)[16] with zero mean : 5915.989
## ARIMA(1,0,1)(0,0,2)[16] with non-zero mean : 5899.016
## ARIMA(1,0,1)(1,0,0)[16] with zero mean : 5913.493
## ARIMA(1,0,1)(1,0,0)[16] with non-zero mean : 5896.933
## ARIMA(1,0,1)(1,0,1)[16] with zero mean : 5915.217
## ARIMA(1,0,1)(1,0,1)[16] with non-zero mean : 5898.97
## ARIMA(1,0,1)(1,0,2)[16] with zero mean : 5915.94
## ARIMA(1,0,1)(1,0,2)[16] with non-zero mean : 5900.99
## ARIMA(1,0,1)(2,0,0)[16] with zero mean : 5915.382
## ARIMA(1,0,1)(2,0,0)[16] with non-zero mean : 5898.974
## ARIMA(1,0,1)(2,0,1)[16] with zero mean : Inf
## ARIMA(1,0,1)(2,0,1)[16] with non-zero mean : 5901.041
## ARIMA(1,0,2) with zero mean : 5926.647
## ARIMA(1,0,2) with non-zero mean : 5903.617
## ARIMA(1,0,2)(0,0,1)[16] with zero mean : 5912.013
## ARIMA(1,0,2)(0,0,1)[16] with non-zero mean : 5892.174
## ARIMA(1,0,2)(0,0,2)[16] with zero mean : 5913.573
## ARIMA(1,0,2)(0,0,2)[16] with non-zero mean : 5894.22
## ARIMA(1,0,2)(1,0,0)[16] with zero mean : 5910.984
## ARIMA(1,0,2)(1,0,0)[16] with non-zero mean : 5892.276
## ARIMA(1,0,2)(1,0,1)[16] with zero mean : 5912.652
## ARIMA(1,0,2)(1,0,1)[16] with non-zero mean : 5894.206
## ARIMA(1,0,2)(2,0,0)[16] with zero mean : 5912.904
## ARIMA(1,0,2)(2,0,0)[16] with non-zero mean : 5894.253
## ARIMA(1,0,3) with zero mean : 5911.226
## ARIMA(1,0,3) with non-zero mean : 5877.901
## ARIMA(1,0,3)(0,0,1)[16] with zero mean : 5896.345
## ARIMA(1,0,3)(0,0,1)[16] with non-zero mean : 5868.702
## ARIMA(1,0,3)(1,0,0)[16] with zero mean : 5893.882
## ARIMA(1,0,3)(1,0,0)[16] with non-zero mean : 5868.454
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 5879.617
## ARIMA(2,0,0) with zero mean : 5929.216
## ARIMA(2,0,0) with non-zero mean : 5907.817
## ARIMA(2,0,0)(0,0,1)[16] with zero mean : 5914.481
## ARIMA(2,0,0)(0,0,1)[16] with non-zero mean : 5895.919
## ARIMA(2,0,0)(0,0,2)[16] with zero mean : 5915.699
## ARIMA(2,0,0)(0,0,2)[16] with non-zero mean : 5897.689
## ARIMA(2,0,0)(1,0,0)[16] with zero mean : 5913.188
## ARIMA(2,0,0)(1,0,0)[16] with non-zero mean : 5895.566
## ARIMA(2,0,0)(1,0,1)[16] with zero mean : 5914.807
## ARIMA(2,0,0)(1,0,1)[16] with non-zero mean : 5897.626
## ARIMA(2,0,0)(1,0,2)[16] with zero mean : 5915.466
## ARIMA(2,0,0)(1,0,2)[16] with non-zero mean : 5899.655
## ARIMA(2,0,0)(2,0,0)[16] with zero mean : 5914.183
## ARIMA(2,0,0)(2,0,0)[16] with non-zero mean : 5897.627
## ARIMA(2,0,0)(2,0,1)[16] with zero mean : Inf
## ARIMA(2,0,0)(2,0,1)[16] with non-zero mean : 5899.699
## ARIMA(2,0,1) with zero mean : 5930.771
## ARIMA(2,0,1) with non-zero mean : 5902.491
## ARIMA(2,0,1)(0,0,1)[16] with zero mean : 5915.467
## ARIMA(2,0,1)(0,0,1)[16] with non-zero mean : 5890.756
## ARIMA(2,0,1)(0,0,2)[16] with zero mean : 5916.721
## ARIMA(2,0,1)(0,0,2)[16] with non-zero mean : 5892.439
## ARIMA(2,0,1)(1,0,0)[16] with zero mean : 5914.155
## ARIMA(2,0,1)(1,0,0)[16] with non-zero mean : 5890.223
## ARIMA(2,0,1)(1,0,1)[16] with zero mean : 5914.858
## ARIMA(2,0,1)(1,0,1)[16] with non-zero mean : 5892.293
## ARIMA(2,0,1)(2,0,0)[16] with zero mean : 5916.06
## ARIMA(2,0,1)(2,0,0)[16] with non-zero mean : 5892.295
## ARIMA(2,0,2) with zero mean : 5925.483
## ARIMA(2,0,2) with non-zero mean : 5891.825
## ARIMA(2,0,2)(0,0,1)[16] with zero mean : 5910.169
## ARIMA(2,0,2)(0,0,1)[16] with non-zero mean : 5880.613
## ARIMA(2,0,2)(1,0,0)[16] with zero mean : 5908.505
## ARIMA(2,0,2)(1,0,0)[16] with non-zero mean : 5880.4
## ARIMA(2,0,3) with zero mean : 5911.948
## ARIMA(2,0,3) with non-zero mean : 5879.561
## ARIMA(3,0,0) with zero mean : 5928.15
## ARIMA(3,0,0) with non-zero mean : 5900.006
## ARIMA(3,0,0)(0,0,1)[16] with zero mean : 5913.108
## ARIMA(3,0,0)(0,0,1)[16] with non-zero mean : 5888.623
## ARIMA(3,0,0)(0,0,2)[16] with zero mean : 5914.335
## ARIMA(3,0,0)(0,0,2)[16] with non-zero mean : 5890.524
## ARIMA(3,0,0)(1,0,0)[16] with zero mean : 5911.652
## ARIMA(3,0,0)(1,0,0)[16] with non-zero mean : 5888.37
## ARIMA(3,0,0)(1,0,1)[16] with zero mean : 5912.781
## ARIMA(3,0,0)(1,0,1)[16] with non-zero mean : 5890.44
## ARIMA(3,0,0)(2,0,0)[16] with zero mean : 5913.366
## ARIMA(3,0,0)(2,0,0)[16] with non-zero mean : 5890.443
## ARIMA(3,0,1) with zero mean : 5930.061
## ARIMA(3,0,1) with non-zero mean : 5899.854
## ARIMA(3,0,1)(0,0,1)[16] with zero mean : 5914.874
## ARIMA(3,0,1)(0,0,1)[16] with non-zero mean : 5888.16
## ARIMA(3,0,1)(1,0,0)[16] with zero mean : 5913.344
## ARIMA(3,0,1)(1,0,0)[16] with non-zero mean : 5887.858
## ARIMA(3,0,2) with zero mean : 5933.983
## ARIMA(3,0,2) with non-zero mean : 5887.709
## ARIMA(4,0,0) with zero mean : 5929.24
## ARIMA(4,0,0) with non-zero mean : 5894.763
## ARIMA(4,0,0)(0,0,1)[16] with zero mean : 5913.391
## ARIMA(4,0,0)(0,0,1)[16] with non-zero mean : 5882.648
## ARIMA(4,0,0)(1,0,0)[16] with zero mean : 5911.555
## ARIMA(4,0,0)(1,0,0)[16] with non-zero mean : 5882.25
## ARIMA(4,0,1) with zero mean : 5925.197
## ARIMA(4,0,1) with non-zero mean : 5891.308
## ARIMA(5,0,0) with zero mean : 5906.657
## ARIMA(5,0,0) with non-zero mean : 5884.985
##
##
##
## Best model: ARIMA(0,0,4)(1,0,0)[16] with non-zero mean
##
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6411.073
## ARIMA(0,0,0) with non-zero mean : 6236.361
## ARIMA(0,0,1) with zero mean : 6142.057
## ARIMA(0,0,1) with non-zero mean : 6024.666
## ARIMA(0,0,2) with zero mean : 6059.217
## ARIMA(0,0,2) with non-zero mean : 5972.392
## ARIMA(0,0,3) with zero mean : 5957.704
## ARIMA(0,0,3) with non-zero mean : 5899.022
## ARIMA(0,0,4) with zero mean : 5936.713
## ARIMA(0,0,4) with non-zero mean : 5892.644
## ARIMA(0,0,5) with zero mean : 5933.312
## ARIMA(0,0,5) with non-zero mean : 5894.52
## ARIMA(1,0,0) with zero mean : 5943.923
## ARIMA(1,0,0) with non-zero mean : 5926.474
## ARIMA(1,0,1) with zero mean : 5944.579
## ARIMA(1,0,1) with non-zero mean : 5924.436
## ARIMA(1,0,2) with zero mean : 5941.704
## ARIMA(1,0,2) with non-zero mean : 5918.603
## ARIMA(1,0,3) with zero mean : 5926.236
## ARIMA(1,0,3) with non-zero mean : 5892.825
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 5894.542
## ARIMA(2,0,0) with zero mean : 5944.289
## ARIMA(2,0,0) with non-zero mean : 5922.817
## ARIMA(2,0,1) with zero mean : 5945.842
## ARIMA(2,0,1) with non-zero mean : 5917.489
## ARIMA(2,0,2) with zero mean : 5940.532
## ARIMA(2,0,2) with non-zero mean : 5906.798
## ARIMA(2,0,3) with zero mean : 5926.952
## ARIMA(2,0,3) with non-zero mean : 5894.486
## ARIMA(3,0,0) with zero mean : 5943.214
## ARIMA(3,0,0) with non-zero mean : 5914.994
## ARIMA(3,0,1) with zero mean : 5945.125
## ARIMA(3,0,1) with non-zero mean : 5914.844
## ARIMA(3,0,2) with zero mean : 5949.048
## ARIMA(3,0,2) with non-zero mean : 5902.651
## ARIMA(4,0,0) with zero mean : 5944.301
## ARIMA(4,0,0) with non-zero mean : 5909.75
## ARIMA(4,0,1) with zero mean : 5940.242
## ARIMA(4,0,1) with non-zero mean : 5906.275
## ARIMA(5,0,0) with zero mean : 5921.646
## ARIMA(5,0,0) with non-zero mean : 5899.919
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6427.441
## ARIMA(0,0,0) with non-zero mean : 6252.459
## ARIMA(0,0,1) with zero mean : 6157.663
## ARIMA(0,0,1) with non-zero mean : 6040.13
## ARIMA(0,0,2) with zero mean : 6074.59
## ARIMA(0,0,2) with non-zero mean : 5987.647
## ARIMA(0,0,3) with zero mean : 5972.797
## ARIMA(0,0,3) with non-zero mean : 5914.014
## ARIMA(0,0,4) with zero mean : 5951.733
## ARIMA(0,0,4) with non-zero mean : 5907.603
## ARIMA(0,0,5) with zero mean : 5948.315
## ARIMA(0,0,5) with non-zero mean : 5909.476
## ARIMA(1,0,0) with zero mean : 5958.974
## ARIMA(1,0,0) with non-zero mean : 5941.511
## ARIMA(1,0,1) with zero mean : 5959.626
## ARIMA(1,0,1) with non-zero mean : 5939.472
## ARIMA(1,0,2) with zero mean : 5956.74
## ARIMA(1,0,2) with non-zero mean : 5933.614
## ARIMA(1,0,3) with zero mean : 5941.224
## ARIMA(1,0,3) with non-zero mean : 5907.777
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 5909.497
## ARIMA(2,0,0) with zero mean : 5959.336
## ARIMA(2,0,0) with non-zero mean : 5937.855
## ARIMA(2,0,1) with zero mean : 5960.886
## ARIMA(2,0,1) with non-zero mean : 5932.536
## ARIMA(2,0,2) with zero mean : 5955.56
## ARIMA(2,0,2) with non-zero mean : 5921.801
## ARIMA(2,0,3) with zero mean : 5941.936
## ARIMA(2,0,3) with non-zero mean : 5909.443
## ARIMA(3,0,0) with zero mean : 5958.254
## ARIMA(3,0,0) with non-zero mean : 5930.021
## ARIMA(3,0,1) with zero mean : 5960.163
## ARIMA(3,0,1) with non-zero mean : 5929.876
## ARIMA(3,0,2) with zero mean : 5964.043
## ARIMA(3,0,2) with non-zero mean : 5917.618
## ARIMA(4,0,0) with zero mean : 5959.338
## ARIMA(4,0,0) with non-zero mean : 5924.785
## ARIMA(4,0,1) with zero mean : 5955.262
## ARIMA(4,0,1) with non-zero mean : 5921.289
## ARIMA(5,0,0) with zero mean : 5936.613
## ARIMA(5,0,0) with non-zero mean : 5914.888
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6443.865
## ARIMA(0,0,0) with non-zero mean : 6268.435
## ARIMA(0,0,1) with zero mean : 6173.382
## ARIMA(0,0,1) with non-zero mean : 6055.43
## ARIMA(0,0,2) with zero mean : 6090.049
## ARIMA(0,0,2) with non-zero mean : 6002.788
## ARIMA(0,0,3) with zero mean : 5987.993
## ARIMA(0,0,3) with non-zero mean : 5928.924
## ARIMA(0,0,4) with zero mean : 5966.857
## ARIMA(0,0,4) with non-zero mean : 5922.489
## ARIMA(0,0,5) with zero mean : 5963.413
## ARIMA(0,0,5) with non-zero mean : 5924.36
## ARIMA(1,0,0) with zero mean : 5974.091
## ARIMA(1,0,0) with non-zero mean : 5956.506
## ARIMA(1,0,1) with zero mean : 5974.742
## ARIMA(1,0,1) with non-zero mean : 5954.456
## ARIMA(1,0,2) with zero mean : 5971.838
## ARIMA(1,0,2) with non-zero mean : 5948.576
## ARIMA(1,0,3) with zero mean : 5956.299
## ARIMA(1,0,3) with non-zero mean : 5922.662
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 5924.381
## ARIMA(2,0,0) with zero mean : 5974.451
## ARIMA(2,0,0) with non-zero mean : 5952.834
## ARIMA(2,0,1) with zero mean : 5976.002
## ARIMA(2,0,1) with non-zero mean : 5947.503
## ARIMA(2,0,2) with zero mean : 5970.652
## ARIMA(2,0,2) with non-zero mean : 5936.729
## ARIMA(2,0,3) with zero mean : 5957.005
## ARIMA(2,0,3) with non-zero mean : 5924.327
## ARIMA(3,0,0) with zero mean : 5973.359
## ARIMA(3,0,0) with non-zero mean : 5944.976
## ARIMA(3,0,1) with zero mean : 5975.269
## ARIMA(3,0,1) with non-zero mean : 5944.828
## ARIMA(3,0,2) with zero mean : 5979.166
## ARIMA(3,0,2) with non-zero mean : 5932.525
## ARIMA(4,0,0) with zero mean : 5974.444
## ARIMA(4,0,0) with non-zero mean : 5939.724
## ARIMA(4,0,1) with zero mean : 5970.356
## ARIMA(4,0,1) with non-zero mean : 5936.211
## ARIMA(5,0,0) with zero mean : 5951.655
## ARIMA(5,0,0) with non-zero mean : 5929.788
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6460.525
## ARIMA(0,0,0) with non-zero mean : 6284.275
## ARIMA(0,0,1) with zero mean : 6189.3
## ARIMA(0,0,1) with non-zero mean : 6070.695
## ARIMA(0,0,2) with zero mean : 6105.797
## ARIMA(0,0,2) with non-zero mean : 6017.938
## ARIMA(0,0,3) with zero mean : 6003.444
## ARIMA(0,0,3) with non-zero mean : 5943.901
## ARIMA(0,0,4) with zero mean : 5982.251
## ARIMA(0,0,4) with non-zero mean : 5937.465
## ARIMA(0,0,5) with zero mean : 5978.782
## ARIMA(0,0,5) with non-zero mean : 5939.338
## ARIMA(1,0,0) with zero mean : 5989.48
## ARIMA(1,0,0) with non-zero mean : 5971.634
## ARIMA(1,0,1) with zero mean : 5990.12
## ARIMA(1,0,1) with non-zero mean : 5969.553
## ARIMA(1,0,2) with zero mean : 5987.224
## ARIMA(1,0,2) with non-zero mean : 5963.659
## ARIMA(1,0,3) with zero mean : 5971.637
## ARIMA(1,0,3) with non-zero mean : 5937.644
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 5939.375
## ARIMA(2,0,0) with zero mean : 5989.827
## ARIMA(2,0,0) with non-zero mean : 5967.918
## ARIMA(2,0,1) with zero mean : 5991.371
## ARIMA(2,0,1) with non-zero mean : 5962.536
## ARIMA(2,0,2) with zero mean : 5986.037
## ARIMA(2,0,2) with non-zero mean : 5951.741
## ARIMA(2,0,3) with zero mean : 5972.331
## ARIMA(2,0,3) with non-zero mean : 5939.304
## ARIMA(3,0,0) with zero mean : 5988.74
## ARIMA(3,0,0) with non-zero mean : 5960.018
## ARIMA(3,0,1) with zero mean : 5990.65
## ARIMA(3,0,1) with non-zero mean : 5959.852
## ARIMA(3,0,2) with zero mean : 5994.578
## ARIMA(3,0,2) with non-zero mean : 5947.546
## ARIMA(4,0,0) with zero mean : 5989.823
## ARIMA(4,0,0) with non-zero mean : 5954.721
## ARIMA(4,0,1) with zero mean : 5985.709
## ARIMA(4,0,1) with non-zero mean : 5951.193
## ARIMA(5,0,0) with zero mean : 5966.948
## ARIMA(5,0,0) with non-zero mean : 5944.777
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6477.126
## ARIMA(0,0,0) with non-zero mean : 6300.121
## ARIMA(0,0,1) with zero mean : 6205.004
## ARIMA(0,0,1) with non-zero mean : 6085.987
## ARIMA(0,0,2) with zero mean : 6121.194
## ARIMA(0,0,2) with non-zero mean : 6033.099
## ARIMA(0,0,3) with zero mean : 6018.542
## ARIMA(0,0,3) with non-zero mean : 5958.844
## ARIMA(0,0,4) with zero mean : 5997.263
## ARIMA(0,0,4) with non-zero mean : 5952.394
## ARIMA(0,0,5) with zero mean : 5993.777
## ARIMA(0,0,5) with non-zero mean : 5954.265
## ARIMA(1,0,0) with zero mean : 6004.527
## ARIMA(1,0,0) with non-zero mean : 5986.636
## ARIMA(1,0,1) with zero mean : 6005.161
## ARIMA(1,0,1) with non-zero mean : 5984.556
## ARIMA(1,0,2) with zero mean : 6002.251
## ARIMA(1,0,2) with non-zero mean : 5978.644
## ARIMA(1,0,3) with zero mean : 5986.615
## ARIMA(1,0,3) with non-zero mean : 5952.568
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 5954.287
## ARIMA(2,0,0) with zero mean : 6004.868
## ARIMA(2,0,0) with non-zero mean : 5982.924
## ARIMA(2,0,1) with zero mean : 6006.412
## ARIMA(2,0,1) with non-zero mean : 5977.545
## ARIMA(2,0,2) with zero mean : 6001.055
## ARIMA(2,0,2) with non-zero mean : 5966.715
## ARIMA(2,0,3) with zero mean : 5987.305
## ARIMA(2,0,3) with non-zero mean : 5954.232
## ARIMA(3,0,0) with zero mean : 6003.771
## ARIMA(3,0,0) with non-zero mean : 5975.018
## ARIMA(3,0,1) with zero mean : 6005.681
## ARIMA(3,0,1) with non-zero mean : 5974.852
## ARIMA(3,0,2) with zero mean : 6009.584
## ARIMA(3,0,2) with non-zero mean : 5962.49
## ARIMA(4,0,0) with zero mean : 6004.851
## ARIMA(4,0,0) with non-zero mean : 5969.711
## ARIMA(4,0,1) with zero mean : 6000.722
## ARIMA(4,0,1) with non-zero mean : 5966.162
## ARIMA(5,0,0) with zero mean : 5981.909
## ARIMA(5,0,0) with non-zero mean : 5959.712
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6493.708
## ARIMA(0,0,0) with non-zero mean : 6315.97
## ARIMA(0,0,1) with zero mean : 6220.824
## ARIMA(0,0,1) with non-zero mean : 6101.242
## ARIMA(0,0,2) with zero mean : 6136.698
## ARIMA(0,0,2) with non-zero mean : 6048.213
## ARIMA(0,0,3) with zero mean : 6033.648
## ARIMA(0,0,3) with non-zero mean : 5973.783
## ARIMA(0,0,4) with zero mean : 6012.288
## ARIMA(0,0,4) with non-zero mean : 5967.309
## ARIMA(0,0,5) with zero mean : 6008.774
## ARIMA(0,0,5) with non-zero mean : 5969.181
## ARIMA(1,0,0) with zero mean : 6019.581
## ARIMA(1,0,0) with non-zero mean : 6001.628
## ARIMA(1,0,1) with zero mean : 6020.215
## ARIMA(1,0,1) with non-zero mean : 5999.536
## ARIMA(1,0,2) with zero mean : 6017.279
## ARIMA(1,0,2) with non-zero mean : 5993.619
## ARIMA(1,0,3) with zero mean : 6001.594
## ARIMA(1,0,3) with non-zero mean : 5967.485
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 5969.202
## ARIMA(2,0,0) with zero mean : 6019.921
## ARIMA(2,0,0) with non-zero mean : 5997.9
## ARIMA(2,0,1) with zero mean : 6021.464
## ARIMA(2,0,1) with non-zero mean : 5992.514
## ARIMA(2,0,2) with zero mean : 6016.071
## ARIMA(2,0,2) with non-zero mean : 5981.685
## ARIMA(2,0,3) with zero mean : 6002.278
## ARIMA(2,0,3) with non-zero mean : 5969.148
## ARIMA(3,0,0) with zero mean : 6018.809
## ARIMA(3,0,0) with non-zero mean : 5989.986
## ARIMA(3,0,1) with zero mean : 6020.718
## ARIMA(3,0,1) with non-zero mean : 5989.823
## ARIMA(3,0,2) with zero mean : 6024.638
## ARIMA(3,0,2) with non-zero mean : 5977.429
## ARIMA(4,0,0) with zero mean : 6019.886
## ARIMA(4,0,0) with non-zero mean : 5984.677
## ARIMA(4,0,1) with zero mean : 6015.737
## ARIMA(4,0,1) with non-zero mean : 5981.114
## ARIMA(5,0,0) with zero mean : 5996.869
## ARIMA(5,0,0) with non-zero mean : 5974.636
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6510.13
## ARIMA(0,0,0) with non-zero mean : 6331.928
## ARIMA(0,0,1) with zero mean : 6236.414
## ARIMA(0,0,1) with non-zero mean : 6116.696
## ARIMA(0,0,2) with zero mean : 6152.059
## ARIMA(0,0,2) with non-zero mean : 6063.454
## ARIMA(0,0,3) with zero mean : 6048.71
## ARIMA(0,0,3) with non-zero mean : 5988.861
## ARIMA(0,0,4) with zero mean : 6027.292
## ARIMA(0,0,4) with non-zero mean : 5982.374
## ARIMA(0,0,5) with zero mean : 6023.767
## ARIMA(0,0,5) with non-zero mean : 5984.244
## ARIMA(1,0,0) with zero mean : 6034.656
## ARIMA(1,0,0) with non-zero mean : 6016.774
## ARIMA(1,0,1) with zero mean : 6035.284
## ARIMA(1,0,1) with non-zero mean : 6014.674
## ARIMA(1,0,2) with zero mean : 6032.32
## ARIMA(1,0,2) with non-zero mean : 6008.712
## ARIMA(1,0,3) with zero mean : 6016.585
## ARIMA(1,0,3) with non-zero mean : 5982.549
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 5984.266
## ARIMA(2,0,0) with zero mean : 6034.988
## ARIMA(2,0,0) with non-zero mean : 6013.033
## ARIMA(2,0,1) with zero mean : 6036.536
## ARIMA(2,0,1) with non-zero mean : 6007.66
## ARIMA(2,0,2) with zero mean : 6031.1
## ARIMA(2,0,2) with non-zero mean : 5996.766
## ARIMA(2,0,3) with zero mean : 6017.27
## ARIMA(2,0,3) with non-zero mean : 5984.212
## ARIMA(3,0,0) with zero mean : 6033.859
## ARIMA(3,0,0) with non-zero mean : 6005.092
## ARIMA(3,0,1) with zero mean : 6035.767
## ARIMA(3,0,1) with non-zero mean : 6004.945
## ARIMA(3,0,2) with zero mean : 6039.674
## ARIMA(3,0,2) with non-zero mean : 5992.479
## ARIMA(4,0,0) with zero mean : 6034.938
## ARIMA(4,0,0) with non-zero mean : 5999.836
## ARIMA(4,0,1) with zero mean : 6030.776
## ARIMA(4,0,1) with non-zero mean : 5996.248
## ARIMA(5,0,0) with zero mean : 6011.85
## ARIMA(5,0,0) with non-zero mean : 5989.709
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6526.45
## ARIMA(0,0,0) with non-zero mean : 6348.171
## ARIMA(0,0,1) with zero mean : 6251.991
## ARIMA(0,0,1) with non-zero mean : 6132.262
## ARIMA(0,0,2) with zero mean : 6167.41
## ARIMA(0,0,2) with non-zero mean : 6078.952
## ARIMA(0,0,3) with zero mean : 6063.771
## ARIMA(0,0,3) with non-zero mean : 6003.996
## ARIMA(0,0,4) with zero mean : 6042.292
## ARIMA(0,0,4) with non-zero mean : 5997.459
## ARIMA(0,0,5) with zero mean : 6038.756
## ARIMA(0,0,5) with non-zero mean : 5999.328
## ARIMA(1,0,0) with zero mean : 6049.799
## ARIMA(1,0,0) with non-zero mean : 6032.081
## ARIMA(1,0,1) with zero mean : 6050.408
## ARIMA(1,0,1) with non-zero mean : 6029.95
## ARIMA(1,0,2) with zero mean : 6047.422
## ARIMA(1,0,2) with non-zero mean : 6023.978
## ARIMA(1,0,3) with zero mean : 6031.571
## ARIMA(1,0,3) with non-zero mean : 5997.635
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 5999.352
## ARIMA(2,0,0) with zero mean : 6050.107
## ARIMA(2,0,0) with non-zero mean : 6028.299
## ARIMA(2,0,1) with zero mean : 6051.657
## ARIMA(2,0,1) with non-zero mean : 6022.938
## ARIMA(2,0,2) with zero mean : 6046.168
## ARIMA(2,0,2) with non-zero mean : 6011.962
## ARIMA(2,0,3) with zero mean : 6032.258
## ARIMA(2,0,3) with non-zero mean : 5999.295
## ARIMA(3,0,0) with zero mean : 6048.962
## ARIMA(3,0,0) with non-zero mean : 6020.35
## ARIMA(3,0,1) with zero mean : 6050.869
## ARIMA(3,0,1) with non-zero mean : 6020.21
## ARIMA(3,0,2) with zero mean : 6054.801
## ARIMA(3,0,2) with non-zero mean : 6007.615
## ARIMA(4,0,0) with zero mean : 6050.032
## ARIMA(4,0,0) with non-zero mean : 6015.083
## ARIMA(4,0,1) with zero mean : 6045.838
## ARIMA(4,0,1) with non-zero mean : 6011.443
## ARIMA(5,0,0) with zero mean : 6026.84
## ARIMA(5,0,0) with non-zero mean : 6004.816
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6542.781
## ARIMA(0,0,0) with non-zero mean : 6364.329
## ARIMA(0,0,1) with zero mean : 6267.595
## ARIMA(0,0,1) with non-zero mean : 6147.668
## ARIMA(0,0,2) with zero mean : 6182.832
## ARIMA(0,0,2) with non-zero mean : 6094.112
## ARIMA(0,0,3) with zero mean : 6078.877
## ARIMA(0,0,3) with non-zero mean : 6018.926
## ARIMA(0,0,4) with zero mean : 6057.375
## ARIMA(0,0,4) with non-zero mean : 6012.338
## ARIMA(0,0,5) with zero mean : 6053.829
## ARIMA(0,0,5) with non-zero mean : 6014.205
## ARIMA(1,0,0) with zero mean : 6064.848
## ARIMA(1,0,0) with non-zero mean : 6047.08
## ARIMA(1,0,1) with zero mean : 6065.46
## ARIMA(1,0,1) with non-zero mean : 6044.934
## ARIMA(1,0,2) with zero mean : 6062.476
## ARIMA(1,0,2) with non-zero mean : 6038.936
## ARIMA(1,0,3) with zero mean : 6046.62
## ARIMA(1,0,3) with non-zero mean : 6012.513
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 6014.226
## ARIMA(2,0,0) with zero mean : 6065.161
## ARIMA(2,0,0) with non-zero mean : 6043.277
## ARIMA(2,0,1) with zero mean : 6066.7
## ARIMA(2,0,1) with non-zero mean : 6037.888
## ARIMA(2,0,2) with zero mean : 6061.234
## ARIMA(2,0,2) with non-zero mean : 6026.877
## ARIMA(2,0,3) with zero mean : 6047.286
## ARIMA(2,0,3) with non-zero mean : 6014.172
## ARIMA(3,0,0) with zero mean : 6064.021
## ARIMA(3,0,0) with non-zero mean : 6035.296
## ARIMA(3,0,1) with zero mean : 6065.928
## ARIMA(3,0,1) with non-zero mean : 6035.151
## ARIMA(3,0,2) with zero mean : 6069.845
## ARIMA(3,0,2) with non-zero mean : 6022.51
## ARIMA(4,0,0) with zero mean : 6065.091
## ARIMA(4,0,0) with non-zero mean : 6030.014
## ARIMA(4,0,1) with zero mean : 6060.875
## ARIMA(4,0,1) with non-zero mean : 6026.361
## ARIMA(5,0,0) with zero mean : 6041.812
## ARIMA(5,0,0) with non-zero mean : 6019.71
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6559.094
## ARIMA(0,0,0) with non-zero mean : 6380.584
## ARIMA(0,0,1) with zero mean : 6283.162
## ARIMA(0,0,1) with non-zero mean : 6163.289
## ARIMA(0,0,2) with zero mean : 6198.17
## ARIMA(0,0,2) with non-zero mean : 6109.503
## ARIMA(0,0,3) with zero mean : 6093.933
## ARIMA(0,0,3) with non-zero mean : 6033.971
## ARIMA(0,0,4) with zero mean : 6072.368
## ARIMA(0,0,4) with non-zero mean : 6027.342
## ARIMA(0,0,5) with zero mean : 6068.806
## ARIMA(0,0,5) with non-zero mean : 6029.196
## ARIMA(1,0,0) with zero mean : 6079.883
## ARIMA(1,0,0) with non-zero mean : 6062.171
## ARIMA(1,0,1) with zero mean : 6080.491
## ARIMA(1,0,1) with non-zero mean : 6060.036
## ARIMA(1,0,2) with zero mean : 6077.488
## ARIMA(1,0,2) with non-zero mean : 6053.981
## ARIMA(1,0,3) with zero mean : 6061.583
## ARIMA(1,0,3) with non-zero mean : 6027.496
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 6029.218
## ARIMA(2,0,0) with zero mean : 6080.191
## ARIMA(2,0,0) with non-zero mean : 6058.382
## ARIMA(2,0,1) with zero mean : 6081.729
## ARIMA(2,0,1) with non-zero mean : 6052.977
## ARIMA(2,0,2) with zero mean : 6076.237
## ARIMA(2,0,2) with non-zero mean : 6041.887
## ARIMA(2,0,3) with zero mean : 6062.246
## ARIMA(2,0,3) with non-zero mean : 6029.165
## ARIMA(3,0,0) with zero mean : 6079.038
## ARIMA(3,0,0) with non-zero mean : 6050.357
## ARIMA(3,0,1) with zero mean : 6080.944
## ARIMA(3,0,1) with non-zero mean : 6050.203
## ARIMA(3,0,2) with zero mean : 6085.093
## ARIMA(3,0,2) with non-zero mean : 6037.523
## ARIMA(4,0,0) with zero mean : 6080.105
## ARIMA(4,0,0) with non-zero mean : 6045.045
## ARIMA(4,0,1) with zero mean : 6075.871
## ARIMA(4,0,1) with non-zero mean : 6041.364
## ARIMA(5,0,0) with zero mean : 6056.76
## ARIMA(5,0,0) with non-zero mean : 6034.691
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6575.409
## ARIMA(0,0,0) with non-zero mean : 6396.792
## ARIMA(0,0,1) with zero mean : 6298.758
## ARIMA(0,0,1) with non-zero mean : 6178.712
## ARIMA(0,0,2) with zero mean : 6213.517
## ARIMA(0,0,2) with non-zero mean : 6124.772
## ARIMA(0,0,3) with zero mean : 6108.991
## ARIMA(0,0,3) with non-zero mean : 6048.978
## ARIMA(0,0,4) with zero mean : 6087.362
## ARIMA(0,0,4) with non-zero mean : 6042.298
## ARIMA(0,0,5) with zero mean : 6083.782
## ARIMA(0,0,5) with non-zero mean : 6044.148
## ARIMA(1,0,0) with zero mean : 6094.915
## ARIMA(1,0,0) with non-zero mean : 6077.183
## ARIMA(1,0,1) with zero mean : 6095.522
## ARIMA(1,0,1) with non-zero mean : 6075.039
## ARIMA(1,0,2) with zero mean : 6092.5
## ARIMA(1,0,2) with non-zero mean : 6068.988
## ARIMA(1,0,3) with zero mean : 6076.547
## ARIMA(1,0,3) with non-zero mean : 6042.444
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 6044.17
## ARIMA(2,0,0) with zero mean : 6095.221
## ARIMA(2,0,0) with non-zero mean : 6073.385
## ARIMA(2,0,1) with zero mean : 6096.766
## ARIMA(2,0,1) with non-zero mean : 6067.976
## ARIMA(2,0,2) with zero mean : 6091.241
## ARIMA(2,0,2) with non-zero mean : 6056.901
## ARIMA(2,0,3) with zero mean : 6077.207
## ARIMA(2,0,3) with non-zero mean : 6044.12
## ARIMA(3,0,0) with zero mean : 6094.059
## ARIMA(3,0,0) with non-zero mean : 6065.363
## ARIMA(3,0,1) with zero mean : 6095.965
## ARIMA(3,0,1) with non-zero mean : 6065.203
## ARIMA(3,0,2) with zero mean : 6099.879
## ARIMA(3,0,2) with non-zero mean : 6052.519
## ARIMA(4,0,0) with zero mean : 6095.127
## ARIMA(4,0,0) with non-zero mean : 6060.024
## ARIMA(4,0,1) with zero mean : 6090.876
## ARIMA(4,0,1) with non-zero mean : 6056.334
## ARIMA(5,0,0) with zero mean : 6071.702
## ARIMA(5,0,0) with non-zero mean : 6049.647
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6591.854
## ARIMA(0,0,0) with non-zero mean : 6412.7
## ARIMA(0,0,1) with zero mean : 6314.502
## ARIMA(0,0,1) with non-zero mean : 6193.966
## ARIMA(0,0,2) with zero mean : 6229.147
## ARIMA(0,0,2) with non-zero mean : 6139.871
## ARIMA(0,0,3) with zero mean : 6124.307
## ARIMA(0,0,3) with non-zero mean : 6063.881
## ARIMA(0,0,4) with zero mean : 6102.601
## ARIMA(0,0,4) with non-zero mean : 6057.187
## ARIMA(0,0,5) with zero mean : 6098.999
## ARIMA(0,0,5) with non-zero mean : 6059.039
## ARIMA(1,0,0) with zero mean : 6110.213
## ARIMA(1,0,0) with non-zero mean : 6092.229
## ARIMA(1,0,1) with zero mean : 6110.815
## ARIMA(1,0,1) with non-zero mean : 6090.062
## ARIMA(1,0,2) with zero mean : 6107.8
## ARIMA(1,0,2) with non-zero mean : 6083.995
## ARIMA(1,0,3) with zero mean : 6091.745
## ARIMA(1,0,3) with non-zero mean : 6057.339
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 6059.06
## ARIMA(2,0,0) with zero mean : 6110.515
## ARIMA(2,0,0) with non-zero mean : 6088.399
## ARIMA(2,0,1) with zero mean : 6112.041
## ARIMA(2,0,1) with non-zero mean : 6082.957
## ARIMA(2,0,2) with zero mean : 6106.526
## ARIMA(2,0,2) with non-zero mean : 6071.829
## ARIMA(2,0,3) with zero mean : 6092.405
## ARIMA(2,0,3) with non-zero mean : 6059.009
## ARIMA(3,0,0) with zero mean : 6109.361
## ARIMA(3,0,0) with non-zero mean : 6080.34
## ARIMA(3,0,1) with zero mean : 6111.267
## ARIMA(3,0,1) with non-zero mean : 6080.166
## ARIMA(3,0,2) with zero mean : 6115.199
## ARIMA(3,0,2) with non-zero mean : 6067.443
## ARIMA(4,0,0) with zero mean : 6110.424
## ARIMA(4,0,0) with non-zero mean : 6074.96
## ARIMA(4,0,1) with zero mean : 6106.144
## ARIMA(4,0,1) with non-zero mean : 6071.252
## ARIMA(5,0,0) with zero mean : 6086.83
## ARIMA(5,0,0) with non-zero mean : 6064.546
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## [1] "input_series=data$sold_count"
##
## ARIMA(0,0,0) with zero mean : 6608.296
## ARIMA(0,0,0) with non-zero mean : 6428.606
## ARIMA(0,0,1) with zero mean : 6330.121
## ARIMA(0,0,1) with non-zero mean : 6209.306
## ARIMA(0,0,2) with zero mean : 6244.493
## ARIMA(0,0,2) with non-zero mean : 6155.067
## ARIMA(0,0,3) with zero mean : 6139.405
## ARIMA(0,0,3) with non-zero mean : 6078.792
## ARIMA(0,0,4) with zero mean : 6117.614
## ARIMA(0,0,4) with non-zero mean : 6072.074
## ARIMA(0,0,5) with zero mean : 6113.99
## ARIMA(0,0,5) with non-zero mean : 6073.924
## ARIMA(1,0,0) with zero mean : 6125.247
## ARIMA(1,0,0) with non-zero mean : 6107.211
## ARIMA(1,0,1) with zero mean : 6125.841
## ARIMA(1,0,1) with non-zero mean : 6105.048
## ARIMA(1,0,2) with zero mean : 6122.815
## ARIMA(1,0,2) with non-zero mean : 6098.955
## ARIMA(1,0,3) with zero mean : 6106.716
## ARIMA(1,0,3) with non-zero mean : 6072.223
## ARIMA(1,0,4) with zero mean : Inf
## ARIMA(1,0,4) with non-zero mean : 6073.946
## ARIMA(2,0,0) with zero mean : 6125.539
## ARIMA(2,0,0) with non-zero mean : 6103.388
## ARIMA(2,0,1) with zero mean : 6127.066
## ARIMA(2,0,1) with non-zero mean : 6097.953
## ARIMA(2,0,2) with zero mean : 6121.532
## ARIMA(2,0,2) with non-zero mean : 6086.778
## ARIMA(2,0,3) with zero mean : 6107.374
## ARIMA(2,0,3) with non-zero mean : 6073.896
## ARIMA(3,0,0) with zero mean : 6124.377
## ARIMA(3,0,0) with non-zero mean : 6095.322
## ARIMA(3,0,1) with zero mean : 6126.283
## ARIMA(3,0,1) with non-zero mean : 6095.148
## ARIMA(3,0,2) with zero mean : 6130.203
## ARIMA(3,0,2) with non-zero mean : 6082.345
## ARIMA(4,0,0) with zero mean : 6125.44
## ARIMA(4,0,0) with non-zero mean : 6089.931
## ARIMA(4,0,1) with zero mean : 6121.146
## ARIMA(4,0,1) with non-zero mean : 6086.197
## ARIMA(5,0,0) with zero mean : 6101.778
## ARIMA(5,0,0) with non-zero mean : 6079.455
##
##
##
## Best model: ARIMA(0,0,4) with non-zero mean
##
## [1] "input_series=ts(data$sold_count,freq=16)"
## variable n mean sd CV FBias MAPE
## 1: lm_prediction2 14 412.4286 232.3915 0.5634709 -4.6142064 5.1883257
## 2: lm_prediction3 14 412.4286 232.3915 0.5634709 -4.7166790 5.3069999
## 3: lm_prediction4 14 412.4286 232.3915 0.5634709 -4.5580423 5.0437342
## 4: lm_prediction5 14 412.4286 232.3915 0.5634709 0.1458778 0.6688978
## 5: lm_prediction6 14 412.4286 232.3915 0.5634709 -4.4771036 4.9528052
## 6: arima_prediction 14 412.4286 232.3915 0.5634709 -0.3479633 0.7460511
## 7: sarima_prediction 14 412.4286 232.3915 0.5634709 -0.2817767 0.6716086
## 8: selected_arima 14 412.4286 232.3915 0.5634709 0.1967600 0.6791326
## RMSE MAD MADP WMAPE
## 1: 2184.1016 1903.0305 4.6142064 4.6142064
## 2: 2238.7716 1945.2932 4.7166790 4.7166790
## 3: 2165.7375 1879.8669 4.5580423 4.5580423
## 4: 303.9193 245.5246 0.5953143 0.5953143
## 5: 2125.5963 1846.4854 4.4771036 4.4771036
## 6: 221.0945 188.7099 0.4575578 0.4575578
## 7: 203.6424 168.6188 0.4088437 0.4088437
## 8: 276.4342 230.1284 0.5579837 0.5579837
Smallest Weighted Mean Absolute Percentage Error is obtained for ARIMA(0,0,4) with 16 day frequency decomposition ,and it is the model that auto arima suggested. So further on this model is selected for our prediction purposes.
For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are are not too far.
With the selected model, 1 day ahead prediction can be performed using all the data on hand, since in this competition one day ahead prediction should be submitted.
## price event_date product_content_id sold_count visit_count favored_count
## 1: 44.86 2021-07-01 31515569 483 10787 649
## basket_count category_sold category_brand_sold category_visits ty_visits
## 1: 2046 7463 1510 418946 106491398
## category_basket category_favored w_day mon is_campaign
## 1: 38687 37332 5 7 0
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.4764
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0238
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
##
## Call:
## arima(x = detrend2, order = c(0, 0, 4), xreg = data_31515569$is_campaign, include.mean = TRUE)
##
## Coefficients:
## ma1 ma2 ma3 ma4 intercept data_31515569$is_campaign
## 0.7895 0.5020 0.2626 -0.0056 0.9158 0.6705
## s.e. 0.0538 0.0719 0.0723 0.0584 0.0542 0.1015
##
## sigma^2 estimated as 0.1698: log likelihood = -206.42, aic = 426.85
## [1] 426.8456
## [1] 454.5546
## Time Series:
## Start = c(26, 4)
## End = c(26, 4)
## Frequency = 16
## [1] 249.9944
## price event_date product_content_id sold_count visit_count favored_count
## 1: 44.86 2021-07-03 31515569 483 10787 649
## basket_count category_sold category_brand_sold category_visits ty_visits
## 1: 2046 7463 1510 418946 106491398
## category_basket category_favored w_day mon is_campaign arima1_prediction
## 1: 38687 37332 5 7 0 249.9944
Before making forecasting models for product 6, it should be looked at the plot of data and examined the seasonalities and trend. Below, you can see the plot of sales quantity of Product 6.For the empty places in sold counts, the mean of the data is taken. There is a slightly increasing trend, especially in the beginning and end of the plot. There can’t be seen any significant seasonality. To look further, there is a plot of 3 months of 2021 - March, April and May -. Again, the seasonality isn’t significant. In conclusion, it can be said that there is no seasonality.
First type of model that is going to used is linear regression model. First of all, it would be wise to select attributes that will help to model from correlation matrix. Below, you can see the correlations between the attributes. According to this matrix, just basket_count can be added to the model.
In the first model, the attribute is added to the model. The adjusted R-squared value indicates whether model is good or not. The value for the first model is pretty high which is a good sign. But there are outliers which is probably due to campaigns and holidays. The outliers can be eliminated for a better model. Lastly, ‘lag1’ attribute can be added because it is very high in the ACF. In the final linear regression model, adjusted R-squared value is high enough and plots are good enough to make predictions.
##
## Call:
## lm(formula = sold_count ~ basket_count, data = sold)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.044 -1.727 1.130 1.130 22.761
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.974707 0.936233 10.65 <2e-16 ***
## basket_count 0.126016 0.005341 23.60 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.745 on 367 degrees of freedom
## Multiple R-squared: 0.6027, Adjusted R-squared: 0.6016
## F-statistic: 556.8 on 1 and 367 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 65.762, df = 10, p-value = 2.897e-10
## sold_count
## Min. : 1.00
## 1st Qu.:32.00
## Median :32.86
## Mean :30.45
## 3rd Qu.:32.86
## Max. :81.00
##
## Call:
## lm(formula = sold_count ~ big_outlier + small_outlier + basket_count,
## data = sold)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.3260 -0.3618 -0.3618 -0.3618 18.0578
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.926503 0.755673 29.02 <2e-16 ***
## big_outlier 8.283919 0.779576 10.63 <2e-16 ***
## small_outlier -13.178828 0.582728 -22.62 <2e-16 ***
## basket_count 0.065431 0.004239 15.44 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.156 on 365 degrees of freedom
## Multiple R-squared: 0.85, Adjusted R-squared: 0.8488
## F-statistic: 689.6 on 3 and 365 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 21.904, df = 10, p-value = 0.0156
##
## Call:
## lm(formula = sold_count ~ lag1 + big_outlier + small_outlier +
## basket_count, data = sold)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.9443 -0.3295 -0.3295 -0.3295 15.8885
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.054336 0.742197 29.715 < 2e-16 ***
## lag1 0.201297 0.051769 3.888 0.00012 ***
## big_outlier 8.315454 0.764965 10.870 < 2e-16 ***
## small_outlier -13.404209 0.574705 -23.324 < 2e-16 ***
## basket_count 0.064925 0.004161 15.601 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.078 on 364 degrees of freedom
## Multiple R-squared: 0.856, Adjusted R-squared: 0.8544
## F-statistic: 541 on 4 and 364 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 10.655, df = 10, p-value = 0.385
Second type of model that is going to build is ARIMA model. For this model, in the beginning, the data should be decomposed. Firstly, a frequency value should be chosen. Since there is no significant seasonality, the highest value in the ACF will be chosen which is 9. Additive type of decomposition will be used for this task. Below, the random series can be seen.
After the decomposition, (p,d,q) values should be chosen for the model. For this task, ACF and PACF will be examined. Looking at the ACF, for ‘q’ value 3 can be chosen and looking at the PACF, for ‘p’ value 3 or 6 can be chosen. Also, auto.arima function is used as well. The AIC and BIC values of models that are suggested can be seen below. Looking at AIC and BIC values, (6,0,3) model is best among them. After the model is selected, the regressors that most correlates with the sold count are added to model to make it better. In the final model, the AIC and BIC values are lower. We can proceed with this model.
##
## Call:
## arima(x = detrend, order = c(3, 0, 3))
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 intercept
## 0.9007 -0.0338 -0.3890 -1.1415 -0.169 0.3105 -0.0022
## s.e. 0.5949 0.8402 0.4472 0.5991 0.988 0.3941 0.0030
##
## sigma^2 estimated as 31.95: log likelihood = -1141.11, aic = 2298.23
## [1] 2298.226
## [1] 2329.337
##
## Call:
## arima(x = detrend, order = c(6, 0, 3))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ma1 ma2
## 0.3837 0.1171 -0.3832 -0.1001 -0.0044 -0.2076 -0.6075 -0.4494
## s.e. 0.2530 0.2258 0.1678 0.0954 0.0762 0.0602 0.2598 0.2562
## ma3 intercept
## 0.0571 -0.0023
## s.e. 0.2283 0.0032
##
## sigma^2 estimated as 31.23: log likelihood = -1137.01, aic = 2296.01
## [1] 2296.013
## [1] 2338.791
## Series: detrend
## ARIMA(0,0,1) with non-zero mean
##
## Coefficients:
## ma1 mean
## 0.2199 -0.0189
## s.e. 0.0486 0.4640
##
## sigma^2 estimated as 52.58: log likelihood=-1226.45
## AIC=2458.9 AICc=2458.97 BIC=2470.57
## [1] 2458.899
## [1] 2470.565
##
## Call:
## arima(x = detrend, order = c(6, 0, 3), xreg = xreg)
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ma1 ma2
## 0.6351 0.2570 -0.6182 0.0153 0.0723 -0.1588 -0.8719 -0.5285
## s.e. 0.2349 0.2836 0.2008 0.0886 0.0820 0.0811 0.2360 0.3094
## ma3 intercept xreg
## 0.4429 -0.3081 0.0018
## s.e. 0.2606 0.1347 0.0008
##
## sigma^2 estimated as 30.8: log likelihood = -1132.94, aic = 2289.89
## [1] 2289.885
## [1] 2336.552
We selected two models for prediction. Here, it can be seen their accuracy values. According to box plot, the weighted mean absolute errors for Arima model is higher. We should choose Linear model because WMAPE value of the model is lower which is a sign for better model.
## variable n mean sd CV FBias MAPE RMSE
## 1: lm_prediction 14 50.71429 11.75015 0.231693 0.07380659 0.1490211 10.36884
## 2: selected_arima 14 50.71429 11.75015 0.231693 0.05651922 0.2509979 15.59195
## MAD MADP WMAPE
## 1: 8.003332 0.1578122 0.1578122
## 2: 12.670955 0.2498498 0.2498498
For conclusion, here is a plot of actual test set and predicted values of chosen model. As it can be seen, the predictions are pretty accurate.
Oral-B Rechargeable ToothBrush
First of all, the general behaviour of data is examined during the day by time plot.
Secocondly, the distribution in days and months is plotted to see if it is changed depend on month and day.
Finally , by ACF and PACF graph, the relationship between previous observations is observed.
It can be say that, there is a trend in data, and if trend factor is excluded, the autocorrelation between lag1, lag3 and lag is significant.
The data is depend on month and day factor by observing boxplot of data. Since the day factor is significant, day factor will be used in model construction instead of lag7 and the frequency of data determined as 7.
Examination of Attributes
The some of the attributes of data is not reliable, therefore, it is examined by summary of data.
## price event_date product_content_id sold_count
## Min. :110.1 Min. :2020-05-25 Length:404 Min. : 0.00
## 1st Qu.:129.9 1st Qu.:2020-09-02 Class :character 1st Qu.: 20.00
## Median :136.2 Median :2020-12-12 Mode :character Median : 57.00
## Mean :135.3 Mean :2020-12-12 Mean : 94.93
## 3rd Qu.:141.6 3rd Qu.:2021-03-23 3rd Qu.:139.50
## Max. :165.9 Max. :2021-07-02 Max. :513.00
## NA's :9
## visit_count favored_count basket_count category_sold
## Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 321.0
## 1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 92.0 1st Qu.: 609.2
## Median : 0 Median : 171.5 Median : 239.5 Median : 804.5
## Mean : 2270 Mean : 357.6 Mean : 399.7 Mean :1008.6
## 3rd Qu.: 4320 3rd Qu.: 593.8 3rd Qu.: 578.0 3rd Qu.:1101.0
## Max. :15725 Max. :2696.0 Max. :2249.0 Max. :5557.0
##
## category_brand_sold category_visits ty_visits category_basket
## Min. : 0.0 Min. : 346.0 Min. : 1 Min. : 0
## 1st Qu.: 0.0 1st Qu.: 656.5 1st Qu.: 1 1st Qu.: 0
## Median : 680.5 Median : 879.0 Median : 1 Median : 0
## Mean : 2996.9 Mean : 3845.8 Mean : 44617481 Mean : 18632
## 3rd Qu.: 5355.5 3rd Qu.: 1343.8 3rd Qu.:102350467 3rd Qu.: 41373
## Max. :28944.0 Max. :59310.0 Max. :178545693 Max. :281022
##
## category_favored w_day mon is_campaign
## Min. : 1242 Min. :1 Min. : 1.000 Min. :0.00000
## 1st Qu.: 2476 1st Qu.:2 1st Qu.: 4.000 1st Qu.:0.00000
## Median : 3286 Median :4 Median : 6.000 Median :0.00000
## Mean : 4208 Mean :4 Mean : 6.463 Mean :0.08663
## 3rd Qu.: 4886 3rd Qu.:6 3rd Qu.: 9.000 3rd Qu.:0.00000
## Max. :44445 Max. :7 Max. :12.000 Max. :1.00000
##
## price sold_count visit_count favored_count basket_count category_sold
## [1,] 112.9000 0 0.0 0.0 0.0 321.0
## [2,] 129.9000 20 0.0 0.0 92.0 608.5
## [3,] 136.2475 57 0.0 171.5 239.5 804.5
## [4,] 141.6109 140 4374.5 594.5 578.0 1103.0
## [5,] 158.1300 315 10777.0 1465.0 1287.0 1799.0
## category_brand_sold category_visits ty_visits category_basket
## [1,] 0.0 346.0 1 0.0
## [2,] 0.0 656.0 1 0.0
## [3,] 680.5 879.0 1 0.0
## [4,] 5357.0 1345.5 102370187 41481.5
## [5,] 12868.0 2348.0 178545693 103254.0
## category_favored w_day
## [1,] 1242.0 1
## [2,] 2475.5 2
## [3,] 3286.5 4
## [4,] 4887.0 6
## [5,] 8278.0 7
The relationship of attributes and response variable is observed by correlation grapgh.
Basket_count, category_visits and category_favored has high correlation and it seems reliable data from summary of data.However, there is 0 values which is not expected in real life therefore, the zero values are changed as mean.
ty_visits also has 1 value before particular date and it is changed as mean of ty_visits.
Some price values are NA, and they are changed as mean of price since price is not has significant changes during the time.
In the end, “price”,“visit_count”, “basket_count”,“category_favored” , “ty_visits”,“is_campaign” values determined as regressors.
the data will be predicted based on previous observations attributes since the real attributes not available for prediction time.
model construction
the data has no constant variance therefore, besides the simple linear model, the sqrt transformation and boxcox tranformation is used for simple regression model
simple linear regression with no transformation By many iterations, it is seen that day factor is not significant as is expected.
##
## Call:
## lm(formula = sold_count ~ price + visit_count + basket_count +
## category_basket + factor(mon) + factor(is_campaign) + trend +
## lag1 + lag3, data = train7)
##
## Residuals:
## Min 1Q Median 3Q Max
## -120.315 -9.293 -0.319 7.725 121.574
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.476e+01 3.327e+01 1.947 0.052322 .
## price -7.719e-01 2.386e-01 -3.236 0.001322 **
## visit_count -1.034e-02 1.641e-03 -6.299 8.55e-10 ***
## basket_count 2.255e-01 1.013e-02 22.268 < 2e-16 ***
## category_basket 2.710e-04 8.130e-05 3.334 0.000944 ***
## factor(mon)2 -8.302e+00 8.171e+00 -1.016 0.310287
## factor(mon)3 -1.868e+01 7.361e+00 -2.537 0.011594 *
## factor(mon)4 -1.234e+01 8.376e+00 -1.474 0.141467
## factor(mon)5 2.579e+01 8.131e+00 3.171 0.001644 **
## factor(mon)6 2.349e+01 6.665e+00 3.524 0.000479 ***
## factor(mon)7 2.550e+01 7.484e+00 3.408 0.000727 ***
## factor(mon)8 1.680e+01 7.168e+00 2.344 0.019617 *
## factor(mon)9 -1.049e+00 7.681e+00 -0.137 0.891425
## factor(mon)10 -4.350e-01 6.756e+00 -0.064 0.948696
## factor(mon)11 3.925e+00 6.535e+00 0.601 0.548474
## factor(mon)12 -3.041e+00 5.781e+00 -0.526 0.599142
## factor(is_campaign)1 1.590e+00 4.738e+00 0.335 0.737444
## trend 1.819e-01 2.528e-02 7.195 3.52e-12 ***
## lag1 1.734e-01 2.752e-02 6.302 8.38e-10 ***
## lag3 4.396e-02 2.184e-02 2.013 0.044846 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.41 on 369 degrees of freedom
## Multiple R-squared: 0.9505, Adjusted R-squared: 0.9479
## F-statistic: 372.9 on 19 and 369 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 23
##
## data: Residuals
## LM test = 59.442, df = 23, p-value = 4.599e-05
the residuals analysis is good for lm model with no significant autocorrelation around mean zero, however, the variablity of error in higher values is higher.
simple linear regression with sqrt() transformation
By many iterations, the
##
## Call:
## lm(formula = sqrt ~ price + visit_count + basket_count + ty_visits +
## factor(mon) + lag1 + factor(is_campaign) + category_visits +
## category_basket, data = train7)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0284 -0.6227 0.0620 0.6736 3.7829
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.415e+01 1.818e+00 7.785 7.09e-14 ***
## price -7.308e-02 1.230e-02 -5.942 6.51e-09 ***
## visit_count -9.948e-04 9.367e-05 -10.620 < 2e-16 ***
## basket_count 1.137e-02 5.484e-04 20.728 < 2e-16 ***
## ty_visits 4.261e-08 4.811e-09 8.856 < 2e-16 ***
## factor(mon)2 -3.184e-01 4.806e-01 -0.662 0.508094
## factor(mon)3 -4.883e-01 4.251e-01 -1.148 0.251532
## factor(mon)4 -5.228e-01 4.786e-01 -1.092 0.275354
## factor(mon)5 -7.008e-01 4.598e-01 -1.524 0.128294
## factor(mon)6 -1.601e+00 3.481e-01 -4.600 5.81e-06 ***
## factor(mon)7 -1.903e+00 3.514e-01 -5.415 1.10e-07 ***
## factor(mon)8 -1.660e+00 3.690e-01 -4.498 9.22e-06 ***
## factor(mon)9 -1.587e+00 4.211e-01 -3.769 0.000191 ***
## factor(mon)10 -1.692e+00 3.656e-01 -4.627 5.14e-06 ***
## factor(mon)11 -1.183e+00 3.550e-01 -3.333 0.000947 ***
## factor(mon)12 -3.677e-01 3.134e-01 -1.173 0.241425
## lag1 1.216e-02 1.313e-03 9.265 < 2e-16 ***
## factor(is_campaign)1 -2.645e-01 2.579e-01 -1.026 0.305615
## category_visits 4.094e-05 1.236e-05 3.312 0.001018 **
## category_basket -9.950e-06 4.780e-06 -2.082 0.038066 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.217 on 369 degrees of freedom
## Multiple R-squared: 0.9401, Adjusted R-squared: 0.937
## F-statistic: 304.7 on 19 and 369 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 23
##
## data: Residuals
## LM test = 94.014, df = 23, p-value = 1.492e-10
the residuals analysis is model with significant autocorrelation in lag1 around mean zero, however, the variablity of error in higher values is higher. It is poor by comparing lm model with no transformation.
simple linear regression with BoxCox transformation
By many iterations, it is seen that day factornot significant but lag7 is significant and lag3 factor is not significant for boxcox linear model therefore, they excluded and the category_basket is significant for boxcox transformation.
##
## Call:
## lm(formula = BoxCox ~ price + visit_count + basket_count + category_favored +
## ty_visits + factor(mon) + lag1 + lag7 + factor(is_campaign) +
## category_basket, data = train7)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9652 -0.4527 0.1688 0.6840 2.5776
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.395e+01 2.073e+00 6.729 6.57e-11 ***
## price -7.492e-02 1.411e-02 -5.309 1.91e-07 ***
## visit_count -6.764e-04 1.138e-04 -5.943 6.49e-09 ***
## basket_count 6.016e-03 6.815e-04 8.828 < 2e-16 ***
## category_favored 1.495e-04 3.967e-05 3.768 0.000191 ***
## ty_visits 4.365e-08 4.450e-09 9.808 < 2e-16 ***
## factor(mon)2 2.850e-01 5.531e-01 0.515 0.606758
## factor(mon)3 -2.757e-02 4.771e-01 -0.058 0.953947
## factor(mon)4 -7.028e-01 5.234e-01 -1.343 0.180211
## factor(mon)5 -1.665e+00 5.160e-01 -3.227 0.001364 **
## factor(mon)6 -2.033e+00 3.871e-01 -5.253 2.54e-07 ***
## factor(mon)7 -2.211e+00 4.032e-01 -5.484 7.75e-08 ***
## factor(mon)8 -1.558e+00 4.208e-01 -3.702 0.000246 ***
## factor(mon)9 -1.550e+00 4.797e-01 -3.230 0.001348 **
## factor(mon)10 -1.507e+00 4.150e-01 -3.631 0.000322 ***
## factor(mon)11 -1.351e+00 4.104e-01 -3.291 0.001095 **
## factor(mon)12 -3.002e-01 3.570e-01 -0.841 0.400932
## lag1 7.618e-03 1.497e-03 5.088 5.78e-07 ***
## lag7 3.144e-03 1.180e-03 2.665 0.008048 **
## factor(is_campaign)1 -4.970e-01 3.062e-01 -1.623 0.105370
## category_basket -3.579e-05 6.827e-06 -5.243 2.68e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.382 on 368 degrees of freedom
## Multiple R-squared: 0.8266, Adjusted R-squared: 0.8172
## F-statistic: 87.74 on 20 and 368 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 24
##
## data: Residuals
## LM test = 178.01, df = 24, p-value < 2.2e-16
By residuals analysis, boxcox model has big deviation in time and adjusted R-squared value is lower than others.
Arima Models
When arima models is constructed, the auto.arima function is used, and in every day the auto.arima function is runs again. the seasonality is TRUE, and frequency is determined as seven by observing ACF and PACF graph.
Additive Model, Multplive Model, and linear regression model is used for decomposition and get stationary data.
## [1] "The Additive Model"
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 0.0074
## [1] "The Multiplicative Model"
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 0.2026
## [1] "Linear Regression"
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 0.024
the multiplive model is not significant, therefore I will use the addtive decomposition for arima and arima regressors models.
the linear regression model residuals are stationary therefore, the residuals use for arima model and they combined in the end.
the regressors mentioned above is used for arima model with regressors.
## Series: decomposed$random
## ARIMA(0,0,1)(0,0,2)[7] with non-zero mean
##
## Coefficients:
## ma1 sma1 sma2 mean
## 0.3223 0.0894 -0.0959 0.0053
## s.e. 0.0479 0.0523 0.0518 2.2771
##
## sigma^2 estimated as 1160: log likelihood=-1892.85
## AIC=3795.7 AICc=3795.86 BIC=3815.44
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,0,2)[7] with non-zero mean
## Q* = 54.534, df = 10, p-value = 3.858e-08
##
## Model df: 4. Total lags used: 14
By observing, pacf is significant at lag1 and acf drops after lag1 therfore, it is reasonable auto.arima gives the MA(1). And at lag2 as seasonal the pacf and acf is significant, the seasonal order(0,0,2) is reasonable, too.
## Series: decomposed$random
## Regression with ARIMA(5,1,1) errors
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ma1 xreg
## 0.1616 -0.3492 -0.3004 -0.0625 -0.2538 -0.9816 -0.5042
## s.e. 0.0503 0.0506 0.0513 0.0505 0.0504 0.0143 0.2151
##
## sigma^2 estimated as 932.9: log likelihood=-1847.27
## AIC=3710.55 AICc=3710.94 BIC=3742.11
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(5,1,1) errors
## Q* = 19.474, df = 7, p-value = 0.006825
##
## Model df: 7. Total lags used: 14
## [1] 3710.549
By residual analysis, the arima with regressors has no autocorrelated residuals and lower AIC, therefore arima with regressors is better model than arima.
Arima combined with linear Regression
## Series: residuals
## ARIMA(0,0,3) with zero mean
##
## Coefficients:
## ma1 ma2 ma3
## 0.1704 0.1537 0.0900
## s.e. 0.0503 0.0508 0.0536
##
## sigma^2 estimated as 453.5: log likelihood=-1740.24
## AIC=3488.48 AICc=3488.59 BIC=3504.34
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,3) with zero mean
## Q* = 3.569, df = 7, p-value = 0.8279
##
## Model df: 3. Total lags used: 10
the auto arima model on residuals give zero mean and no autocorrelated residuals and lower AIC valu, it is better than arima and arima regressor model by residual analysis.
Predictions
The predictions based on the last available attributes, and the predictions plotted with actual sales values.
## event_date actual sqrt_forecasted_sold BoxCox_forecasted_sold
## 1: 2021-06-18 108 75.30678 38.61865
## 2: 2021-06-19 104 85.18752 51.05261
## 3: 2021-06-20 149 142.34503 109.50590
## 4: 2021-06-21 128 116.80589 116.12060
## 5: 2021-06-22 56 97.96617 86.86636
## 6: 2021-06-23 59 65.41016 57.38209
## 7: 2021-06-24 56 63.05782 54.50673
## 8: 2021-06-25 36 55.53486 45.66327
## 9: 2021-06-26 40 52.72771 41.84033
## 10: 2021-06-27 46 72.90609 72.81313
## 11: 2021-06-28 64 73.59562 67.60311
## 12: 2021-06-29 137 120.37701 114.68402
## 13: 2021-06-30 131 133.14290 129.73114
## 14: 2021-07-01 130 106.68231 90.71156
## lm_forecasted_sold forecasted_lm7_arima add_arima_forecasted
## 1: 130.99117 129.75684 159.88214
## 2: 122.10168 116.96945 151.61943
## 3: 156.84938 152.07316 145.17374
## 4: 130.65973 126.83063 158.80660
## 5: 108.49405 107.28952 154.81502
## 6: 82.48194 74.54178 134.35688
## 7: 77.32646 67.77471 120.78775
## 8: 69.69601 61.49438 97.99568
## 9: 70.19335 63.37959 74.91833
## 10: 67.03158 58.76225 57.04774
## 11: 79.05786 71.53833 52.04136
## 12: 131.06642 126.11501 55.21875
## 13: 140.91006 140.62089 74.90336
## 14: 139.55967 139.03486 87.46261
## reg_add_arima_forecasted
## 1: 143.40567
## 2: 132.43593
## 3: 149.56207
## 4: 176.87823
## 5: 132.44178
## 6: 140.35844
## 7: 109.98056
## 8: 88.86555
## 9: 79.87113
## 10: 67.76062
## 11: 56.92694
## 12: 51.75578
## 13: 74.92831
## 14: 84.87210
EROR rate of Models
## model n mean sd CV FBias
## 1: sqrt_forecasted_sold 14 88.85714 41.34072 0.4652492 -0.01370246
## 2: BoxCox_forecasted_sold 14 88.85714 41.34072 0.4652492 0.13416438
## 3: lm_forecasted_sold 14 88.85714 41.34072 0.4652492 -0.21094803
## 4: forecasted_lm7_arima 14 88.85714 41.34072 0.4652492 -0.15448667
## 5: add_arima_forecasted 14 88.85714 41.34072 0.4652492 -0.22590787
## 6: reg_add_arima_forecasted 14 88.85714 41.34072 0.4652492 -0.19778385
## MAPE RMSE MAD MADP WMAPE
## 1: 0.2508952 20.05112 16.83120 0.1894186 0.1894186
## 2: 0.2530779 30.64493 22.31949 0.2511840 0.2511840
## 3: 0.3394596 23.34176 19.59189 0.2204876 0.2204876
## 4: 0.2611267 19.58886 15.44930 0.1738667 0.1738667
## 5: 0.6984139 55.12053 48.10213 0.5413423 0.5413423
## 6: 0.6529313 51.53582 45.21977 0.5089042 0.5089042
Since the arima model combined model has the lowest WMAPE value it is selected for prediction. However, In every day, the error rates are calculated for last 14 days and the model predictions and the model prediction has the lowest WMAPE value of is selected.
Predictions of Next Day
## add_arima xreg_add_arima forecast_lm forecast_lm_arima
## 87.70957 83.45005 141.31027 143.38729
## BoxCox_lm Sqrt_lm
## 93.26515 108.00000
Altinyildiz Classics Jacket
It can be seen that the sales is zero most of time, however, there is huge increase in October.
The ACF and PACF of data shows that there is significant autocorrelation in lag1 and lag7.
the Examination of Attirubutes
the correlation of price, visit_count, and basket_count is high and it is expected if the sold_count is zero this variables can be zero.
However, it is not expected that category favored and trendyol visits is zero or one therefore these variables changed as mean.
## price event_date product_content_id sold_count
## Min. : -1.0 Min. :2020-05-25 Length:404 Min. : 0.0000
## 1st Qu.:350.0 1st Qu.:2020-09-02 Class :character 1st Qu.: 0.0000
## Median :600.0 Median :2020-12-12 Mode :character Median : 0.0000
## Mean :557.9 Mean :2020-12-12 Mean : 0.9233
## 3rd Qu.:736.6 3rd Qu.:2021-03-23 3rd Qu.: 0.0000
## Max. :833.3 Max. :2021-07-02 Max. :52.0000
## NA's :303
## visit_count favored_count basket_count category_sold
## Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 16.0
## Median : 0.00 Median : 0.000 Median : 0.000 Median : 45.0
## Mean : 27.07 Mean : 2.238 Mean : 5.819 Mean : 198.6
## 3rd Qu.: 3.00 3rd Qu.: 2.000 3rd Qu.: 5.000 3rd Qu.: 108.8
## Max. :516.00 Max. :37.000 Max. :247.000 Max. :3299.0
##
## category_brand_sold category_visits ty_visits category_basket
## Min. : 0 Min. : 367 Min. : 1 Min. : 0
## 1st Qu.: 0 1st Qu.: 1424 1st Qu.: 1 1st Qu.: 0
## Median : 6 Median : 5305 Median : 1 Median : 0
## Mean : 46361 Mean : 27422 Mean : 44617481 Mean : 353883
## 3rd Qu.: 94565 3rd Qu.: 9521 3rd Qu.:102350467 3rd Qu.: 469103
## Max. :259590 Max. :583672 Max. :178545693 Max. :3102147
##
## category_favored w_day mon is_campaign
## Min. : 2324 Min. :1 Min. : 1.000 Min. :0.00000
## 1st Qu.: 8562 1st Qu.:2 1st Qu.: 4.000 1st Qu.:0.00000
## Median : 24608 Median :4 Median : 6.000 Median :0.00000
## Mean : 33744 Mean :4 Mean : 6.463 Mean :0.08663
## 3rd Qu.: 50363 3rd Qu.:6 3rd Qu.: 9.000 3rd Qu.:0.00000
## Max. :244883 Max. :7 Max. :12.000 Max. :1.00000
##
## price sold_count visit_count favored_count basket_count category_sold
## [1,] -1.00 0 0 0 0 0.0
## [2,] 349.99 0 0 0 0 16.0
## [3,] 599.98 0 0 0 0 45.0
## [4,] 736.64 0 3 2 5 109.5
## [5,] 833.32 0 7 5 12 248.0
## category_brand_sold category_visits ty_visits category_basket
## [1,] 0 367.0 1 0
## [2,] 0 1417.0 1 0
## [3,] 6 5305.0 1 0
## [4,] 94567 9526.5 102370187 473826
## [5,] 235840 21187.0 178545693 1177469
## category_favored w_day
## [1,] 2324.0 1
## [2,] 8506.5 2
## [3,] 24608.0 4
## [4,] 50385.0 6
## [5,] 111346.0 7
By considering correlation and variable relaibility the “price”,“visit_count”, “basket_count”,“category_favored” are selected as regressors.
The acf and pacf garph is shows high correlation in lag1,lag2,lag5 and lag7 therefore they are added as attirbutes.
Since Jacket is expensive product, it is expected that consumers consider the previous price of jacket. Therefore, previous prices of Jacket is examined.
the data will be predicted based on previous observations attributes since the real attributes not available for prediction time.
model construction
the data has no constant variance therefore, besides the simple linear model, the sqrt transformation and boxcox tranformation is used for simple regression model
Simple Regression
By many iterations, it is seen that most significant variables are price, visit_count, basket_count, category_favored,factor( w_day ), factor(mon),lag1,lag2,price_lag_4.
##
## Call:
## lm(formula = sold_count ~ price + visit_count + basket_count +
## category_favored + factor(w_day) + factor(mon) + lag1 + lag2 +
## price_lag_4, data = train8)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.7090 -0.2920 -0.0360 0.3066 6.5977
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.389e+00 3.511e-01 3.957 9.14e-05 ***
## price 1.416e-03 4.236e-04 3.343 0.000915 ***
## visit_count 1.204e-03 1.175e-03 1.024 0.306421
## basket_count 1.880e-01 4.495e-03 41.812 < 2e-16 ***
## category_favored -2.514e-05 3.484e-06 -7.214 3.17e-12 ***
## factor(w_day)2 4.494e-01 2.255e-01 1.993 0.046974 *
## factor(w_day)3 3.319e-01 2.267e-01 1.464 0.144090
## factor(w_day)4 5.809e-01 2.267e-01 2.562 0.010809 *
## factor(w_day)5 4.630e-01 2.285e-01 2.027 0.043435 *
## factor(w_day)6 2.596e-01 2.287e-01 1.135 0.257099
## factor(w_day)7 1.589e-01 2.264e-01 0.702 0.483234
## factor(mon)2 -5.283e-02 3.156e-01 -0.167 0.867132
## factor(mon)3 -4.204e-01 3.092e-01 -1.360 0.174786
## factor(mon)4 -8.103e-01 3.275e-01 -2.474 0.013805 *
## factor(mon)5 -1.083e+00 3.565e-01 -3.037 0.002559 **
## factor(mon)6 -1.580e+00 3.553e-01 -4.448 1.15e-05 ***
## factor(mon)7 -1.510e+00 3.717e-01 -4.062 5.96e-05 ***
## factor(mon)8 -1.396e+00 3.642e-01 -3.832 0.000149 ***
## factor(mon)9 -1.271e+00 3.529e-01 -3.600 0.000362 ***
## factor(mon)10 7.597e-01 4.539e-01 1.674 0.095031 .
## factor(mon)11 -1.493e+00 3.547e-01 -4.211 3.21e-05 ***
## factor(mon)12 -8.584e-02 3.053e-01 -0.281 0.778747
## lag1 2.452e-03 2.144e-02 0.114 0.909006
## lag2 -7.755e-02 2.114e-02 -3.669 0.000280 ***
## price_lag_4 -2.380e-03 3.866e-04 -6.155 1.99e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.186 on 364 degrees of freedom
## Multiple R-squared: 0.8921, Adjusted R-squared: 0.885
## F-statistic: 125.4 on 24 and 364 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 28
##
## data: Residuals
## LM test = 148.81, df = 28, p-value < 2.2e-16
Simple Linear Regression with sqrt() transformation
By many iteration, it is seen that the plag_4 and lag_2 is not significant for sqrt transformation model, lag5 is significant.
##
## Call:
## lm(formula = sqrt ~ price + visit_count + basket_count + category_favored +
## factor(w_day) + factor(mon) + lag1 + lag5, data = train8)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.12605 -0.07060 0.00275 0.05667 1.38137
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.370e-01 8.516e-02 -1.609 0.108438
## price 1.857e-03 1.034e-04 17.952 < 2e-16 ***
## visit_count 1.832e-03 2.862e-04 6.399 4.79e-10 ***
## basket_count 2.600e-02 1.102e-03 23.586 < 2e-16 ***
## category_favored -2.486e-06 8.787e-07 -2.829 0.004932 **
## factor(w_day)2 1.254e-01 5.586e-02 2.244 0.025416 *
## factor(w_day)3 7.681e-02 5.590e-02 1.374 0.170300
## factor(w_day)4 9.302e-02 5.588e-02 1.665 0.096815 .
## factor(w_day)5 4.764e-02 5.629e-02 0.846 0.397907
## factor(w_day)6 3.585e-02 5.621e-02 0.638 0.524020
## factor(w_day)7 1.142e-02 5.586e-02 0.205 0.838048
## factor(mon)2 -1.245e-01 7.782e-02 -1.599 0.110577
## factor(mon)3 -6.330e-02 7.628e-02 -0.830 0.407146
## factor(mon)4 -9.575e-02 8.092e-02 -1.183 0.237477
## factor(mon)5 5.763e-02 8.818e-02 0.654 0.513796
## factor(mon)6 -1.786e-01 8.796e-02 -2.031 0.042991 *
## factor(mon)7 -1.623e-01 9.222e-02 -1.759 0.079338 .
## factor(mon)8 -1.519e-01 9.033e-02 -1.682 0.093455 .
## factor(mon)9 -1.531e-01 8.745e-02 -1.750 0.080882 .
## factor(mon)10 1.325e-02 1.041e-01 0.127 0.898775
## factor(mon)11 -2.991e-01 8.470e-02 -3.531 0.000466 ***
## factor(mon)12 -7.630e-02 7.495e-02 -1.018 0.309349
## lag1 2.297e-02 5.201e-03 4.417 1.32e-05 ***
## lag5 -8.736e-03 4.947e-03 -1.766 0.078278 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2925 on 365 degrees of freedom
## Multiple R-squared: 0.8944, Adjusted R-squared: 0.8878
## F-statistic: 134.5 on 23 and 365 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 27
##
## data: Residuals
## LM test = 137.99, df = 27, p-value < 2.2e-16
In residual analysis there is no significant difference, and adjusted R-square value of squared transformation is higher.
Simple Linear Regression Model with BoxCox Transformation By many iteration, price, visit_count, basket_count, category_favored, factor( w_day ), factor(mon), lag1 are most significant variables for Simple Linear Regression Model with BoxCox Transformation.
##
## Call:
## lm(formula = BoxCox ~ price + visit_count + basket_count + category_favored +
## factor(w_day) + factor(mon) + lag1, data = train8)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3536 -0.1568 -0.0167 0.1038 3.2974
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.179e+00 2.210e-01 -27.956 < 2e-16 ***
## price 8.694e-03 2.683e-04 32.408 < 2e-16 ***
## visit_count 7.182e-03 7.454e-04 9.636 < 2e-16 ***
## basket_count 3.019e-02 2.854e-03 10.580 < 2e-16 ***
## category_favored -1.417e-06 2.236e-06 -0.634 0.5267
## factor(w_day)2 2.386e-01 1.452e-01 1.643 0.1013
## factor(w_day)3 1.673e-01 1.459e-01 1.146 0.2524
## factor(w_day)4 1.286e-01 1.460e-01 0.881 0.3791
## factor(w_day)5 1.257e-02 1.471e-01 0.085 0.9319
## factor(w_day)6 9.155e-02 1.468e-01 0.624 0.5333
## factor(w_day)7 -1.666e-02 1.459e-01 -0.114 0.9092
## factor(mon)2 -5.102e-01 2.033e-01 -2.510 0.0125 *
## factor(mon)3 -1.734e-01 1.991e-01 -0.871 0.3844
## factor(mon)4 -1.763e-01 2.108e-01 -0.836 0.4035
## factor(mon)5 9.359e-01 2.291e-01 4.086 5.40e-05 ***
## factor(mon)6 -1.883e-01 2.283e-01 -0.825 0.4099
## factor(mon)7 -2.038e-01 2.390e-01 -0.853 0.3943
## factor(mon)8 -1.983e-01 2.342e-01 -0.847 0.3977
## factor(mon)9 -2.341e-01 2.270e-01 -1.031 0.3032
## factor(mon)10 -2.322e-01 2.712e-01 -0.856 0.3925
## factor(mon)11 -5.278e-01 2.150e-01 -2.455 0.0146 *
## factor(mon)12 -1.966e-01 1.958e-01 -1.004 0.3159
## lag1 5.606e-02 1.350e-02 4.153 4.09e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7642 on 366 degrees of freedom
## Multiple R-squared: 0.919, Adjusted R-squared: 0.9142
## F-statistic: 188.9 on 22 and 366 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 26
##
## data: Residuals
## LM test = 119.14, df = 26, p-value = 6.999e-14
In residual analysis and adjusted R-squared comparison BoxCox is better than others, however, it is very sensitive to back transformation, therefore, maybe predictions can be poor.
Arima Models
When arima models is constructed, the auto.arima function is used, and in every day the auto.arima function is runs again. the seasonality is TRUE, and frequency is determined as seven by observing ACF and PACF graph.
Additive Model, Multplive Model, and linear regression model is used for decomposition and get stationary data.
## [1] "The Additive Model"
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 0.0092
## [1] "The Multiplicative Model"
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 0.0894
## [1] "Linear Regression"
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 0.0127
the multiplive model is significant at alpha level = .10, therefore I will use the addtive decomposition for arima and arima regressors models.
the linear regression model residuals are stationary therefore, the residuals use for arima model and they combined in the end.
the regressors mentioned above is used for arima model with regressors.
Arima
## Series: decomposed$random
## ARIMA(5,0,0) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5
## -0.1946 -0.4366 -0.4028 -0.3270 -0.1579
## s.e. 0.0505 0.0486 0.0493 0.0484 0.0503
##
## sigma^2 estimated as 5.204: log likelihood=-857.33
## AIC=1726.67 AICc=1726.89 BIC=1750.36
##
## Ljung-Box test
##
## data: Residuals from ARIMA(5,0,0) with zero mean
## Q* = 53.984, df = 9, p-value = 1.901e-08
##
## Model df: 5. Total lags used: 14
Arima with Regressor
## Series: decomposed$random
## Regression with ARIMA(0,0,0)(0,0,2)[7] errors
##
## Coefficients:
## sma1 sma2 intercept xreg
## 0.1987 -0.1041 -0.3066 0.0013
## s.e. 0.0506 0.0502 0.2076 0.0006
##
## sigma^2 estimated as 6.891: log likelihood=-911.34
## AIC=1832.67 AICc=1832.83 BIC=1852.41
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(0,0,0)(0,0,2)[7] errors
## Q* = 99.899, df = 10, p-value < 2.2e-16
##
## Model df: 4. Total lags used: 14
Arima combined with linear Regression
## Series: residuals
## ARIMA(5,0,1) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ma1
## 0.9452 0.1830 -0.1834 -0.3084 0.1771 -0.944
## s.e. 0.0664 0.0679 0.0695 0.0682 0.0594 0.040
##
## sigma^2 estimated as 1.097: log likelihood=-567.5
## AIC=1148.99 AICc=1149.29 BIC=1176.74
##
## Ljung-Box test
##
## data: Residuals from ARIMA(5,0,1) with zero mean
## Q* = 2.9941, df = 4, p-value = 0.5588
##
## Model df: 6. Total lags used: 10
Predictions
The all models are used to predict include mul_arima and reg_mul_arima since they significant at alpha = 0.10.
## event_date actual sqrt_forecasted_sold BoxCox_forecasted_sold
## 1: 2021-06-18 3 1 0
## 2: 2021-06-19 0 0 0
## 3: 2021-06-20 1 2 3
## 4: 2021-06-21 2 2 2
## 5: 2021-06-22 2 1 0
## 6: 2021-06-23 2 1 0
## 7: 2021-06-24 2 1 0
## 8: 2021-06-25 2 1 0
## 9: 2021-06-26 1 0 0
## 10: 2021-06-27 0 0 0
## 11: 2021-06-28 4 1 0
## 12: 2021-06-29 1 3 4
## 13: 2021-06-30 0 0 0
## 14: 2021-07-01 1 1 1
## lm_forecasted_sold forecasted_lm8_arima add_arima_forecasted
## 1: -1 0 2
## 2: -1 -1 2
## 3: 1 2 3
## 4: 1 2 3
## 5: 1 0 2
## 6: 1 1 2
## 7: 0 0 2
## 8: 0 0 1
## 9: 0 1 1
## 10: -1 0 1
## 11: 1 1 1
## 12: 2 2 2
## 13: -1 -1 2
## 14: 2 2 3
## mul_arima_forecasted reg_add_arima_forecasted reg_mul_arima_forecasted
## 1: 2 2 0
## 2: 2 2 0
## 3: 1 3 0
## 4: 2 3 5
## 5: 2 2 5
## 6: 2 2 3
## 7: 2 2 -1
## 8: 1 1 0
## 9: 1 1 1
## 10: 1 1 1
## 11: 1 1 1
## 12: 2 2 2
## 13: 2 2 6
## 14: 2 3 0
EROR rates
## model n mean sd CV FBias MAPE RMSE
## 1: sqrt_forecasted_sold 14 1.5 1.160239 0.7734925 0.3333333 NaN 1.281740
## 2: BoxCox_forecasted_sold 14 1.5 1.160239 0.7734925 0.5238095 NaN 1.982062
## 3: lm_forecasted_sold 14 1.5 1.160239 0.7734925 0.7619048 Inf 1.732051
## 4: forecasted_lm8_arima 14 1.5 1.160239 0.7734925 0.5714286 NaN 1.603567
## 5: add_arima_forecasted 14 1.5 1.160239 0.7734925 -0.2857143 Inf 1.463850
## 6: mul_arima_forecasted 14 1.5 1.160239 0.7734925 -0.0952381 Inf 1.253566
## 7: reg_add_arima_forecasted 14 1.5 1.160239 0.7734925 -0.2857143 Inf 1.463850
## 8: reg_mul_arima_forecasted 14 1.5 1.160239 0.7734925 -0.0952381 NaN 2.535463
## MAD MADP WMAPE
## 1: 0.9285714 0.6190476 0.6190476
## 2: 1.5000000 1.0000000 1.0000000
## 3: 1.4285714 0.9523810 0.9523810
## 4: 1.2857143 0.8571429 0.8571429
## 5: 1.1428571 0.7619048 0.7619048
## 6: 0.8571429 0.5714286 0.5714286
## 7: 1.1428571 0.7619048 0.7619048
## 8: 2.0000000 1.3333333 1.3333333
The error rates are very high, however the range of response variable too narrow, therefore, it is expected. Like if the sales = 1 and the prediction is equal= 2 the error rate will be %100.
The mul_arima_forecasted has the lowest error rate.
Next Day Prediction In every day, the error rates are calculated for last 14 days and the model predictions and the model prediction has the lowest WMAPE value of is selected.
## add_arima mul_arima xreg_mul_arima xreg_add_arima
## 1.1646832 1.0368527 0.7060850 1.2623962
## forecast_lm forecast_lm_arima.1 BoxCox_lm Sqrt_lm
## 0.6887907 0.6155989 1.4901452 1.0000000
TrendyolMilla Bikini Top
By observing the graph below, the month effect is clearly observable. It is expected since bikini is wore in hot seasons in Turkey. Moreover, by examined the acf and pacf graph, it can be said that there is trend in data and correlation with lag1 and lag7.
the “price”,“category_sold”, “basket_count”,“category_favored” attributes are more relaible and significantly corralet with data. Even if the visit_count and favored_count is very high corraleted with data, they also corraleted with basket_count therefore they do not used in regressors.
## price event_date product_content_id sold_count
## Min. :59.99 Min. :2020-05-25 Length:404 Min. : 0.00
## 1st Qu.:59.99 1st Qu.:2020-09-02 Class :character 1st Qu.: 0.00
## Median :59.99 Median :2020-12-12 Mode :character Median : 0.00
## Mean :60.11 Mean :2020-12-12 Mean : 18.39
## 3rd Qu.:59.99 3rd Qu.:2021-03-23 3rd Qu.: 3.00
## Max. :63.55 Max. :2021-07-02 Max. :286.00
## NA's :281
## visit_count favored_count basket_count category_sold
## Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 20.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 131.8
## Median : 0.0 Median : 0.0 Median : 0.00 Median : 562.5
## Mean : 2460.6 Mean : 241.1 Mean : 88.91 Mean :1290.3
## 3rd Qu.: 578.5 3rd Qu.: 110.5 3rd Qu.: 19.00 3rd Qu.:1664.8
## Max. :45833.0 Max. :5011.0 Max. :1735.00 Max. :8099.0
##
## category_brand_sold category_visits ty_visits category_basket
## Min. : 0 Min. : 107.0 Min. : 1 Min. : 0
## 1st Qu.: 0 1st Qu.: 395.5 1st Qu.: 1 1st Qu.: 0
## Median : 2958 Median : 1360.5 Median : 1 Median : 0
## Mean : 14053 Mean : 80947.3 Mean : 44617481 Mean : 118640
## 3rd Qu.: 15158 3rd Qu.: 2869.5 3rd Qu.:102350467 3rd Qu.: 102690
## Max. :152168 Max. :1335060.0 Max. :178545693 Max. :1230833
##
## category_favored w_day mon is_campaign
## Min. : 628 Min. :1 Min. : 1.000 Min. :0.00000
## 1st Qu.: 2581 1st Qu.:2 1st Qu.: 4.000 1st Qu.:0.00000
## Median : 7788 Median :4 Median : 6.000 Median :0.00000
## Mean : 15181 Mean :4 Mean : 6.463 Mean :0.08663
## 3rd Qu.: 16146 3rd Qu.:6 3rd Qu.: 9.000 3rd Qu.:0.00000
## Max. :135551 Max. :7 Max. :12.000 Max. :1.00000
##
The trend, lag1,lag2,lag3, and lag7 variables are added data.
model construction
the data has no constant variance therefore, besides the simple linear model, the sqrt transformation and boxcox tranformation is used for simple regression model
In product9, attributes are reliable therefore the all attributes are tried to add model and most significance ones selected for the model .
simple linear regression with no transformation
##
## Call:
## lm(formula = sold_count ~ price + visit_count + basket_count +
## favored_count + category_sold + category_visits + category_basket +
## category_favored + category_brand_sold + factor(w_day) +
## factor(mon) + trend + lag1 + lag3, data = train9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.328 -1.123 -0.017 1.429 31.654
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.531e+02 1.106e+02 -2.289 0.022663 *
## price 4.264e+00 1.852e+00 2.303 0.021875 *
## visit_count -1.070e-03 6.204e-04 -1.725 0.085326 .
## basket_count 2.042e-01 7.701e-03 26.517 < 2e-16 ***
## favored_count -5.007e-03 4.404e-03 -1.137 0.256322
## category_sold 5.077e-03 8.825e-04 5.753 1.88e-08 ***
## category_visits 3.727e-06 8.721e-06 0.427 0.669356
## category_basket 3.370e-05 1.500e-05 2.246 0.025284 *
## category_favored -3.203e-04 8.061e-05 -3.973 8.58e-05 ***
## category_brand_sold -1.533e-04 1.245e-04 -1.232 0.218893
## factor(w_day)2 -1.559e+00 1.129e+00 -1.381 0.168182
## factor(w_day)3 7.781e-01 1.150e+00 0.677 0.499153
## factor(w_day)4 1.227e-01 1.152e+00 0.107 0.915202
## factor(w_day)5 5.869e-02 1.154e+00 0.051 0.959465
## factor(w_day)6 -2.655e-01 1.141e+00 -0.233 0.816057
## factor(w_day)7 5.084e-01 1.138e+00 0.447 0.655289
## factor(mon)2 -6.969e+00 1.826e+00 -3.816 0.000159 ***
## factor(mon)3 -7.138e+00 1.762e+00 -4.050 6.28e-05 ***
## factor(mon)4 -6.955e+00 1.990e+00 -3.496 0.000532 ***
## factor(mon)5 -1.004e+01 3.877e+00 -2.590 0.009980 **
## factor(mon)6 -6.875e+00 3.570e+00 -1.926 0.054932 .
## factor(mon)7 -3.589e+00 3.185e+00 -1.127 0.260672
## factor(mon)8 -7.678e-01 2.790e+00 -0.275 0.783305
## factor(mon)9 -1.371e+00 2.540e+00 -0.540 0.589588
## factor(mon)10 -2.238e+00 2.286e+00 -0.979 0.328171
## factor(mon)11 -1.977e+00 2.059e+00 -0.960 0.337572
## factor(mon)12 -1.114e+00 1.674e+00 -0.665 0.506367
## trend -7.571e-03 1.437e-02 -0.527 0.598650
## lag1 8.940e-02 2.394e-02 3.734 0.000219 ***
## lag3 8.024e-02 1.817e-02 4.417 1.33e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.856 on 359 degrees of freedom
## Multiple R-squared: 0.9861, Adjusted R-squared: 0.985
## F-statistic: 877.8 on 29 and 359 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 33
##
## data: Residuals
## LM test = 174.56, df = 33, p-value < 2.2e-16
The Adjusted R-squared value is very high and residuals seems to no autocorraled arround the mean zero. The model is a can be good fit.
Simple Linear Regression Model with sqrt transformation
##
## Call:
## lm(formula = sqrt ~ price + visit_count + basket_count + favored_count +
## category_sold + category_visits + category_basket + category_favored +
## category_brand_sold + ty_visits + factor(w_day) + factor(mon) +
## lag1 + lag3, data = train9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3550 -0.2360 -0.0627 0.1745 4.8302
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.441e+01 1.309e+01 -1.101 0.27153
## price 2.393e-01 2.176e-01 1.100 0.27227
## visit_count 4.678e-05 7.203e-05 0.649 0.51647
## basket_count 1.017e-02 9.330e-04 10.904 < 2e-16 ***
## favored_count -6.264e-04 4.933e-04 -1.270 0.20494
## category_sold 5.239e-04 1.078e-04 4.860 1.76e-06 ***
## category_visits 2.675e-06 8.506e-07 3.145 0.00180 **
## category_basket 2.998e-06 1.912e-06 1.568 0.11774
## category_favored -4.896e-05 9.317e-06 -5.255 2.54e-07 ***
## category_brand_sold -4.739e-06 1.571e-05 -0.302 0.76309
## ty_visits 1.469e-08 3.011e-09 4.878 1.61e-06 ***
## factor(w_day)2 1.480e-01 1.361e-01 1.088 0.27753
## factor(w_day)3 2.979e-01 1.380e-01 2.158 0.03155 *
## factor(w_day)4 3.222e-01 1.386e-01 2.325 0.02066 *
## factor(w_day)5 3.394e-01 1.387e-01 2.448 0.01486 *
## factor(w_day)6 3.425e-01 1.376e-01 2.489 0.01325 *
## factor(w_day)7 2.680e-01 1.365e-01 1.964 0.05034 .
## factor(mon)2 -9.714e-02 3.247e-01 -0.299 0.76499
## factor(mon)3 -1.092e+00 2.771e-01 -3.941 9.76e-05 ***
## factor(mon)4 -2.470e+00 2.943e-01 -8.390 1.13e-15 ***
## factor(mon)5 -9.352e-01 3.219e-01 -2.906 0.00389 **
## factor(mon)6 -2.216e-01 2.781e-01 -0.797 0.42599
## factor(mon)7 -2.571e-02 2.692e-01 -0.096 0.92397
## factor(mon)8 1.470e-01 2.327e-01 0.632 0.52797
## factor(mon)9 -4.711e-02 2.250e-01 -0.209 0.83426
## factor(mon)10 -2.080e-01 2.245e-01 -0.927 0.35465
## factor(mon)11 -2.063e-01 2.255e-01 -0.915 0.36095
## factor(mon)12 -1.924e-01 1.960e-01 -0.981 0.32704
## lag1 7.630e-03 2.887e-03 2.642 0.00859 **
## lag3 3.889e-03 2.143e-03 1.815 0.07036 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7077 on 359 degrees of freedom
## Multiple R-squared: 0.9687, Adjusted R-squared: 0.9662
## F-statistic: 383.2 on 29 and 359 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 33
##
## data: Residuals
## LM test = 179.08, df = 33, p-value < 2.2e-16
The sqrt tranformation is also good fit model by R-squared value and residual analysis, However, it has lower R-squared value than no transformation model.
BoxCox Transformation
##
## Call:
## lm(formula = BoxCox ~ price + visit_count + basket_count + favored_count +
## category_visits + category_basket + ty_visits + factor(w_day) +
## factor(mon) + lag1 + lag3, data = train9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.8438 -0.3777 -0.0468 0.3162 7.4378
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.566e+01 2.409e+01 -0.650 0.51613
## price 2.039e-01 4.009e-01 0.509 0.61138
## visit_count 2.201e-04 1.253e-04 1.757 0.07969 .
## basket_count 7.421e-03 1.593e-03 4.658 4.48e-06 ***
## favored_count -1.526e-03 7.124e-04 -2.142 0.03282 *
## category_visits 1.860e-06 9.205e-07 2.021 0.04406 *
## category_basket 3.343e-06 1.030e-06 3.248 0.00127 **
## ty_visits 3.306e-08 5.284e-09 6.257 1.11e-09 ***
## factor(w_day)2 3.704e-01 2.515e-01 1.473 0.14164
## factor(w_day)3 5.500e-01 2.521e-01 2.181 0.02979 *
## factor(w_day)4 7.521e-01 2.526e-01 2.977 0.00311 **
## factor(w_day)5 7.995e-01 2.518e-01 3.175 0.00162 **
## factor(w_day)6 7.057e-01 2.535e-01 2.784 0.00564 **
## factor(w_day)7 5.575e-01 2.523e-01 2.209 0.02777 *
## factor(mon)2 6.805e-01 5.955e-01 1.143 0.25392
## factor(mon)3 -1.121e+00 5.107e-01 -2.196 0.02872 *
## factor(mon)4 -4.796e+00 5.381e-01 -8.912 < 2e-16 ***
## factor(mon)5 -1.438e+00 4.918e-01 -2.924 0.00368 **
## factor(mon)6 4.299e-02 3.332e-01 0.129 0.89739
## factor(mon)7 -5.215e-01 3.350e-01 -1.557 0.12041
## factor(mon)8 -4.757e-01 3.349e-01 -1.420 0.15637
## factor(mon)9 -5.032e-01 3.378e-01 -1.490 0.13714
## factor(mon)10 -5.092e-01 3.348e-01 -1.521 0.12920
## factor(mon)11 -4.708e-01 3.376e-01 -1.395 0.16397
## factor(mon)12 -5.100e-01 3.350e-01 -1.522 0.12878
## lag1 9.649e-03 5.306e-03 1.818 0.06982 .
## lag3 4.296e-03 3.952e-03 1.087 0.27766
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.31 on 362 degrees of freedom
## Multiple R-squared: 0.9339, Adjusted R-squared: 0.9291
## F-statistic: 196.6 on 26 and 362 DF, p-value: < 2.2e-16
##
## Breusch-Godfrey test for serial correlation of order up to 30
##
## data: Residuals
## LM test = 176.24, df = 30, p-value < 2.2e-16
BoxCox transformation is also can be good fit model since the adjusted R-square value high.
In all lm models the residuals is significantly corraleted in lag1 it is not desirable.
Arima Models
Arima Models
When arima models is constructed, the auto.arima function is used, and in every day the auto.arima function is runs again. the seasonality is TRUE, and frequency is determined as seven by observing ACF and PACF graph.
Additive Model, Multplive Model, and linear regression model is used for decomposition and get stationary data.
## [1] "The Additive Model"
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 0.0074
## [1] "The Multiplicative Model"
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 0.0767
## [1] "Linear Regression"
##
## #######################################
## # KPSS Unit Root / Cointegration Test #
## #######################################
##
## The value of the test statistic is: 0.0266
I used the addtive model in examination, however, the mul model is also used in predictions and calculated error rate since it is significant at level = 0.05
Arima
## Series: decomposed$random
## ARIMA(0,0,2)(0,0,2)[7] with zero mean
##
## Coefficients:
## ma1 ma2 sma1 sma2
## 0.0175 -0.2200 0.1261 0.1426
## s.e. 0.0676 0.0786 0.0562 0.0597
##
## sigma^2 estimated as 101.4: log likelihood=-1426.16
## AIC=2862.31 AICc=2862.47 BIC=2882.05
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,2)(0,0,2)[7] with zero mean
## Q* = 63.195, df = 10, p-value = 8.962e-10
##
## Model df: 4. Total lags used: 14
Arima with Regressor
## Series: decomposed$random
## Regression with ARIMA(0,0,2)(1,0,2)[7] errors
##
## Coefficients:
## ma1 ma2 sar1 sma1 sma2 intercept xreg
## -0.0989 -0.4228 -0.8466 0.9099 0.2316 336.1997 -5.5940
## s.e. 0.0874 0.1040 0.0731 0.0875 0.0586 131.6755 2.1907
##
## sigma^2 estimated as 98.22: log likelihood=-1419.66
## AIC=2855.32 AICc=2855.71 BIC=2886.9
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(0,0,2)(1,0,2)[7] errors
## Q* = 75.554, df = 7, p-value = 1.107e-13
##
## Model df: 7. Total lags used: 14
Arima comined with Linear Regression
## Series: residuals
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.1642 0.0050
## s.e. 0.0500 0.3364
##
## sigma^2 estimated as 30.95: log likelihood=-1218.56
## AIC=2443.13 AICc=2443.19 BIC=2455.02
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0) with non-zero mean
## Q* = 17.51, df = 8, p-value = 0.02522
##
## Model df: 2. Total lags used: 10
Predictions
## event_date actual sqrt_forecasted_sold BoxCox_forecasted_sold
## 1: 2021-06-18 46 39.60757 24.49287
## 2: 2021-06-19 26 40.73874 27.92878
## 3: 2021-06-20 15 37.68012 32.82206
## 4: 2021-06-21 20 18.12076 15.49253
## 5: 2021-06-22 47 19.00498 15.18145
## 6: 2021-06-23 40 23.86390 19.88549
## 7: 2021-06-24 37 22.30142 18.94182
## 8: 2021-06-25 20 21.13676 15.11249
## 9: 2021-06-26 27 15.26509 10.20279
## 10: 2021-06-27 20 29.71328 23.90117
## 11: 2021-06-28 26 16.29134 15.25011
## 12: 2021-06-29 19 29.99692 31.42128
## 13: 2021-06-30 20 29.01757 30.40515
## 14: 2021-07-01 14 20.43026 18.36339
## lm_forecasted_sold forecasted_lm9_arima add_arima_forecasted
## 1: 52.06382 48.22485 53.63868
## 2: 53.60874 51.82412 53.17379
## 3: 25.62132 21.49631 55.07792
## 4: 28.82384 31.34988 50.73318
## 5: 48.30872 42.48197 39.35532
## 6: 41.30296 42.75482 40.53021
## 7: 38.18560 35.97025 37.48216
## 8: 32.00821 34.95974 33.04259
## 9: 26.10528 23.68064 28.24629
## 10: 18.14426 20.09753 31.74235
## 11: 18.17181 16.38052 32.17666
## 12: 29.19631 30.60151 28.66079
## 13: 32.02116 29.43881 25.90133
## 14: 27.17501 26.75307 24.36589
## mul_arima_forecasted reg_add_arima_forecasted reg_mul_arima_forecasted
## 1: 47.78079 53.62648 49.07409
## 2: 38.06669 53.13962 37.64727
## 3: 74.23554 55.08468 74.99427
## 4: 37.16847 50.77435 46.26403
## 5: 32.56984 39.42061 31.35943
## 6: 53.92517 40.56712 53.99654
## 7: 38.03248 37.93881 39.39668
## 8: 27.91288 33.47772 29.19408
## 9: 23.01908 28.63979 23.45646
## 10: 40.43471 32.14585 41.19011
## 11: 22.61740 32.60472 23.04340
## 12: 23.99802 29.05841 24.46400
## 13: 35.12226 26.30029 35.80710
## 14: 24.92684 24.74350 25.40014
EROR RATES
## model n mean sd CV FBias MAPE
## 1: sqrt_forecasted_sold 14 26.92857 11.11108 0.412613 0.03668779 0.4676869
## 2: BoxCox_forecasted_sold 14 26.92857 11.11108 0.412613 0.20583193 0.4702745
## 3: lm_forecasted_sold 14 26.92857 11.11108 0.412613 -0.24863937 0.3958312
## 4: forecasted_lm9_arima 14 26.92857 11.11108 0.412613 -0.20958624 0.3910197
## 5: add_arima_forecasted 14 26.92857 11.11108 0.412613 -0.41678291 0.6196845
## 6: mul_arima_forecasted 14 26.92857 11.11108 0.412613 -0.37880681 0.6777079
## 7: reg_add_arima_forecasted 14 26.92857 11.11108 0.412613 -0.42578766 0.6306575
## 8: reg_mul_arima_forecasted 14 26.92857 11.11108 0.412613 -0.41986102 0.7308202
## RMSE MAD MADP WMAPE
## 1: 13.64698 11.661328 0.4330467 0.4330467
## 2: 15.27308 12.805877 0.4755498 0.4755498
## 3: 10.77788 8.206740 0.3047596 0.3047596
## 4: 10.64224 8.284804 0.3076585 0.3076585
## 5: 16.88150 12.315465 0.4573382 0.4573382
## 6: 19.33931 13.314110 0.4944232 0.4944232
## 7: 16.98525 12.548624 0.4659967 0.4659967
## 8: 20.43287 14.469216 0.5373184 0.5373184
Next Day Prediction
In every day, the error rates are calculated for last 14 days and the model predictions and the model prediction has the lowest WMAPE value of is selected.
## add_arima mul_arima xreg_mul_arima xreg_add_arima
## 18.76924 15.58620 15.89121 19.15304
## forecast_lm forecast_lm_arima.1 BoxCox_lm Sqrt_lm
## 22.41879 15.34642 17.04839 18.15753
In order to predict one day ahead sales of the different products, different ARIMA and Linear Regression models have been tried and according to their performance on the test set, which consists of dates from 29 May 2021 to 11 June 2021, different models have been selected for each product. As external data, campaign dates of Trendyol is included, however since every campaign that of Trendyol is not included in the website, some of the outlier may have not been explained more correctly in the models, in order to improve the models, further investigation may be held. Also, the sales are affected from the overall component of the economy, so more external data could be included such as dollar exchange rate, for improved accuracy.
Approaching differently to each product is one of the strong sides of the model, since it is a time consuming task. Also trying various models and measuring their performances based on their predictions on the test data is also a strong side of the models that have been proposed for each product.
Overall, it can be said that models work fine, deviation from the real values is not too big.
Lecture Notes